r-spatial / stars

Spatiotemporal Arrays, Raster and Vector Data Cubes
https://r-spatial.github.io/stars/
Apache License 2.0
563 stars 94 forks source link

Raster objects with factor layers #245

Closed mtennekes closed 4 years ago

mtennekes commented 4 years ago

Question related to https://github.com/mtennekes/tmap/issues/388: what is the correct way to convert a raster brick with categorical layers? For instance the object land from tmap:

data(land, package = "tmap")
(raster::is.factor(land))
#>     cover cover_cls     trees elevation 
#>      TRUE      TRUE     FALSE     FALSE
(stars::st_as_stars(land))
#> stars object with 3 dimensions and 1 attribute
#> attribute(s), summary of first 1e+05 cells:
#>      cover      
#>  Min.   : 2.00  
#>  1st Qu.:10.00  
#>  Median :20.00  
#>  Mean   :16.36  
#>  3rd Qu.:20.00  
#>  Max.   :20.00  
#> dimension(s):
#>      from   to offset     delta                       refsys point
#> x       1 1080   -180  0.333333 +proj=longlat +datum=WGS8...    NA
#> y       1  540     90 -0.333333 +proj=longlat +datum=WGS8...    NA
#> band    1    4     NA        NA                           NA    NA
#>                   values    
#> x                   NULL [x]
#> y                   NULL [y]
#> band cover,...,elevation

Created on 2020-01-23 by the reprex package (v0.3.0)

I managed to cast land to a stars object with the following script:

library(stars)
#> Loading required package: abind
#> Loading required package: sf
#> Linking to GEOS 3.8.0, GDAL 3.0.2, PROJ 6.2.1
library(sf)
library(raster)
#> Loading required package: sp
data(land, package = "tmap")

lvls <- levels(land)
m <- land[]
ms <- list()
for (i in 1:4) {
    if (i %in% 1:2) {
        mi <- factor(m[,i], levels = lvls[[i]]$ID, labels = lvls[[i]]$cover)
    } else {
        mi <- m[,i]
    }
    dim(mi) <- dim(land)[2:1]
    if (i %in% 1:2) class(mi) <- c("factor", "matrix")
    names(dim(mi)) <- c("x", "y")
    ms[[i]] <- mi
}
names(ms) <- names(land)

ncols <- ncol(land)
nrows <- nrow(land)
bbx <- st_bbox(land)
crs <- st_crs(land)

dimensions <- structure(list(
    x = structure(list(from = 1, to = ncols, offset = bbx["xmin"],
                  delta = (bbx["xmax"] - bbx["xmin"]) / ncols, refsys = crs$proj4string, point = NULL, values = NULL),
                  class = "dimension"),
    y = structure(list(from = 1, to = nrows, offset = bbx["ymax"],
                       delta = (bbx["ymin"] - bbx["ymax"]) / nrows, refsys = crs$proj4string, point = NULL, values = NULL),
                  class = "dimension")),
    raster = stars:::get_raster(), class = "dimensions")

s1 <- stars::st_as_stars(ms)
attr(s1, "dimensions") <- dimensions

s1
#> stars object with 2 dimensions and 4 attributes
#> attribute(s):
#>                cover                              cover_cls     
#>  Water bodies     :393060   Water                      :393060  
#>  Snow / Ice       : 61986   Snow/ice                   : 61986  
#>  Herbaceous       : 21377   Forest                     : 48851  
#>  Tree Open        : 16171   Other natural vegetation   : 32611  
#>  Sparse vegetation: 12247   Bare area/Sparse vegetation: 26904  
#>  Cropland         : 11658   Cropland                   : 17843  
#>  (Other)          : 66701   (Other)                    :  1945  
#>      trees          elevation     
#>  Min.   :  0.0    Min.   :-412    
#>  1st Qu.:  0.0    1st Qu.: 218    
#>  Median :  0.0    Median : 608    
#>  Mean   : 15.6    Mean   :1140    
#>  3rd Qu.: 19.0    3rd Qu.:1941    
#>  Max.   :100.0    Max.   :6410    
#>  NA's   :393060   NA's   :389580  
#> dimension(s):
#>   from   to offset     delta                       refsys point values    
#> x    1 1080   -180  0.333333 +proj=longlat +ellps=WGS8...  NULL   NULL [x]
#> y    1  540     90 -0.333333 +proj=longlat +ellps=WGS8...  NULL   NULL [y]

Created on 2020-01-23 by the reprex package (v0.3.0)

Does the output look correct? If so:

mtennekes commented 4 years ago

The code line if (i %in% 1:2) class(mi) <- c("factor", "matrix") in the script above is unnecessary and creates problems when applying as.data.frame. Apparently, setting the dims is enough to create a factor matrix.

edzer commented 4 years ago

when all layers are factor one could still put them in the third dimensions, but that makes only sense if each of them has the same levels. I think your approach is the right one; one can always move from layer to multiple attributes and back (merge and split).

plot.stars always only plots the first array; this could of course be generalized in the case of several attributes, but then one would need to handle multiple legends, messy in base plot, ... I guess this might be something for a dedicated package, like tmap?

Nowosad commented 4 years ago

There is a chance that I am over-thinking it, but my guts tell me that it could become an issue in the future. Categorical rasters are now read as a factor in stars, which changes the original values of the data. For example, a value of 1 in the tif file below is now represented by a factor, which is stored as an integer of 2. This could have serious consequences when making some operations (e.g., counting, selecting) as the user cannot use the original values. This discrepancy could also be confusing to many users. I do not have any good solutions yet, but I wanted to start a discussion about this issue.

library(stars)
#> Loading required package: abind
#> Loading required package: sf
#> Linking to GEOS 3.7.1, GDAL 2.3.2, PROJ 5.2.0

# data download -----------------------------------------------------------
tf = tempfile(fileext = ".zip")
td = tempdir()
download.file("https://github.com/r-spatial/stars/files/4345255/lanform-example.zip", tf)
unzip(tf, exdir = td)
lf_path = paste0(td, "/lanform.tif")

# data read ---------------------------------------------------------------
lf = read_stars(lf_path)
lf
#> stars object with 2 dimensions and 1 attribute
#> attribute(s), summary of first 1e+05 cells:
#>                       lanform.tif   
#>  10.Low mountains           :  369  
#>  11.Scattered high mountains:  306  
#>  9. Scattered low mountains :  273  
#>  12.High mountains          :  240  
#>  17. Surface water          :    8  
#>  (Other)                    :    0  
#>  NA's                       :98804  
#> dimension(s):
#>   from   to   offset delta                       refsys point values    
#> x    1 7360 -1091676   300 PROJCS["unnamed",GEOGCS["... FALSE   NULL [x]
#> y    1 3812 -38556.5  -300 PROJCS["unnamed",GEOGCS["... FALSE   NULL [y]

levels(lf$lanform.tif)
#>  [1] ""                                       
#>  [2] "1. Flat or nearly flat plains"          
#>  [3] "2. Smooth plains with some local relief"
#>  [4] ""                                       
#>  [5] ""                                       
#>  [6] "5.Scattered moderate hills"             
#>  [7] "6. Moderate hills"                      
#>  [8] "7. Scattered high hills"                
#>  [9] "8. High hills"                          
#> [10] "9. Scattered low mountains"             
#> [11] "10.Low mountains"                       
#> [12] "11.Scattered high mountains"            
#> [13] "12.High mountains"                      
#> [14] "13.Tablelands with moderate relief"     
#> [15] "14.Tablelands with considerable relief" 
#> [16] "15.Tablelands with high relief"         
#> [17] "16.Tablelands with ver high relief"     
#> [18] "17. Surface water"

gdalUtils::gdalinfo(lf_path)
#>   [1] "Driver: GTiff/GeoTIFF"                                                    
#>   [2] "Files: /tmp/RtmpyrmzC4/lanform.tif"                                       
#>   [3] "       /tmp/RtmpyrmzC4/lanform.tif.aux.xml"                               
#>   [4] "Size is 7360, 3812"                                                       
#>   [5] "Coordinate System is:"                                                    
#>   [6] "PROJCS[\"unnamed\","                                                      
#>   [7] "    GEOGCS[\"WGS 84\","                                                   
#>   [8] "        DATUM[\"unknown\","                                               
#>   [9] "            SPHEROID[\"WGS84\",6378137,298.257223563]],"                  
#>  [10] "        PRIMEM[\"Greenwich\",0],"                                         
#>  [11] "        UNIT[\"degree\",0.0174532925199433]],"                            
#>  [12] "    PROJECTION[\"Cylindrical_Equal_Area\"],"                              
#>  [13] "    PARAMETER[\"standard_parallel_1\",5.5],"                              
#>  [14] "    PARAMETER[\"central_meridian\",140.8],"                               
#>  [15] "    PARAMETER[\"false_easting\",0],"                                      
#>  [16] "    PARAMETER[\"false_northing\",0],"                                     
#>  [17] "    UNIT[\"metre\",1,"                                                    
#>  [18] "        AUTHORITY[\"EPSG\",\"9001\"]]]"                                   
#>  [19] "Origin = (-1091676.099780400050804,-38556.486310934997164)"               
#>  [20] "Pixel Size = (300.000000000000000,-300.000000000000000)"                  
#>  [21] "Metadata:"                                                                
#>  [22] "  AREA_OR_POINT=Area"                                                     
#>  [23] "Image Structure Metadata:"                                                
#>  [24] "  COMPRESSION=DEFLATE"                                                    
#>  [25] "  INTERLEAVE=BAND"                                                        
#>  [26] "Corner Coordinates:"                                                      
#>  [27] "Upper Left  (-1091676.100,  -38556.486) (130d56'53.71\"E,  0d20'49.56\"S)"
#>  [28] "Lower Left  (-1091676.100,-1182156.486) (130d56'53.71\"E, 10d42' 9.59\"S)"
#>  [29] "Upper Right ( 1116323.900,  -38556.486) (150d52'27.05\"E,  0d20'49.56\"S)"
#>  [30] "Lower Right ( 1116323.900,-1182156.486) (150d52'27.05\"E, 10d42' 9.59\"S)"
#>  [31] "Center      (   12323.900, -610356.486) (140d54'40.38\"E,  5d30'10.31\"S)"
#>  [32] "Band 1 Block=7360x1 Type=Byte, ColorInterp=Palette"                       
#>  [33] "  Min=1.000 Max=17.000 "                                                  
#>  [34] "  Minimum=1.000, Maximum=17.000, Mean=nan, StdDev=nan"                    
#>  [35] "  NoData Value=255"                                                       
#>  [36] "  Categories:"                                                            
#>  [37] "      0: "                                                                
#>  [38] "      1: 1. Flat or nearly flat plains"                                   
#>  [39] "      2: 2. Smooth plains with some local relief"                         
#>  [40] "      3: "                                                                
#>  [41] "      4: "                                                                
#>  [42] "      5: 5.Scattered moderate hills"                                      
#>  [43] "      6: 6. Moderate hills"                                               
#>  [44] "      7: 7. Scattered high hills"                                         
#>  [45] "      8: 8. High hills"                                                   
#>  [46] "      9: 9. Scattered low mountains"                                      
#>  [47] "     10: 10.Low mountains"                                                
#>  [48] "     11: 11.Scattered high mountains"                                     
#>  [49] "     12: 12.High mountains"                                               
#>  [50] "     13: 13.Tablelands with moderate relief"                              
#>  [51] "     14: 14.Tablelands with considerable relief"                          
#>  [52] "     15: 15.Tablelands with high relief"                                  
#>  [53] "     16: 16.Tablelands with ver high relief"                              
#>  [54] "     17: 17. Surface water"                                               
#>  [55] "  Metadata:"                                                              
#>  [56] "    STATISTICS_MAXIMUM=17"                                                
#>  [57] "    STATISTICS_MEAN=nan"                                                  
#>  [58] "    STATISTICS_MINIMUM=1"                                                 
#>  [59] "    STATISTICS_STDDEV=nan"                                                
#>  [60] "  Color Table (RGB with 256 entries)"                                     
#>  [61] "    0: 255,255,255,255"                                                   
#>  [62] "    1: 252,255,194,255"                                                   
#>  [63] "    2: 219,226,172,255"                                                   
#>  [64] "    3: 231,255,43,255"                                                    
#>  [65] "    4: 207,233,39,255"                                                    
#>  [66] "    5: 104,227,79,255"                                                    
#>  [67] "    6: 85,178,63,255"                                                     
#>  [68] "    7: 125,169,58,255"                                                    
#>  [69] "    8: 102,137,46,255"                                                    
#>  [70] "    9: 214,175,98,255"                                                    
#>  [71] "   10: 214,154,43,255"                                                    
#>  [72] "   11: 114,83,38,255"                                                     
#>  [73] "   12: 114,69,1,255"                                                      
#>  [74] "   13: 201,161,224,255"                                                   
#>  [75] "   14: 178,141,174,255"                                                   
#>  [76] "   15: 161,128,135,255"                                                   
#>  [77] "   16: 136,108,97,255"                                                    
#>  [78] "   17: 197,244,254,255"                                                   
#>  [79] "   18: 0,0,0,255"                                                         
...                                                   
#> [316] "  255: 0,0,0,0"

Created on 2020-03-19 by the reprex package (v0.3.0) landcover-colors.zip

edzer commented 4 years ago

Well spotted! This commit solves the problem you mention, but thanks to the fact that this raster has minimum value 1. You can't have the cake and eat it: R factors use numerical values underneath the levels that start at 1. So there are several options are now there:

edzer commented 4 years ago

Building on the land cover example of https://github.com/mtennekes/tmap/issues/368#issuecomment-595727166

library(stars)
r = read_stars("pr_landcover_wimperv_10-28-08_se5.img", RAT = "Land Cover Class")
r = droplevels(r)
plot(r, key.pos = 4, key.width = lcm(5))

will give 14 levels, starting at 1, where the original map starts at 0 (hence, 1 is added, that is where this came from, to the levels before droplevels). When reading with

library(stars)
r = read_stars("pr_landcover_wimperv_10-28-08_se5.img", NA_value = 0)
r = droplevels(r)
plot(r, key.pos = 4, key.width = lcm(5))

x

the 0 values are converted to NA before converting to factor, so minimum (non-NA) value is 1, and values are preserved (before droplevel). Maybe there is some silent convention to put nothing important in the 0 value? (The value for RAT is now auto-guessed as the first RAT column of class character).

Nowosad commented 4 years ago

insert bogus levels for rasters that start at values larger than 1 (ever seen one?)

ESA CCI Land Cover (http://maps.elie.ucl.ac.be/CCI/viewer/) data starts from 10. However, it does not have an attribute table attached (you can download it as a csv, .dsr (ENVI), .lyr (ESRI), or .qml (QGIS) file.

edzer commented 4 years ago

Yes, but doesn't contain categories, nor color or attribute tables (I looked at the .tif's)

Nowosad commented 4 years ago

Hi @edzer,

  1. Correct me if I am wrong, but I think we need three elements to completely describe categorical rasters: (a) values, (b) labels, (c) colors.
  2. The previous example seems to not working correctly (see AD2).
  3. Yes, I know that it does not have an attribute table, but this way of labelling categories (e.g. starting from 10) is possible (see AD3). This example works well in QGIS and using the raster package.
  4. I also think that allowing for writing a raster data with attribute table would be a great (future) addition to stars.

AD2

# remotes::install_github("r-spatial/stars")
library(stars)
#> Loading required package: abind
#> Loading required package: sf
#> Linking to GEOS 3.7.1, GDAL 2.3.2, PROJ 5.2.0

# data download -----------------------------------------------------------
tf = tempfile(fileext = ".zip")
td = tempdir()
download.file("https://github.com/r-spatial/stars/files/4345255/lanform-example.zip", tf)
unzip(tf, exdir = td)
lf_path = paste0(td, "/lanform.tif")

# data read ---------------------------------------------------------------
lf = read_stars(lf_path)
lf
#> stars object with 2 dimensions and 1 attribute
#> attribute(s), summary of first 1e+05 cells:
#>                              lanform.tif   
#>  9. Scattered low mountains        :  369  
#>  10.Low mountains                  :  306  
#>  8. High hills                     :  273  
#>  11.Scattered high mountains       :  240  
#>  16.Tablelands with ver high relief:    8  
#>  (Other)                           :    0  
#>  NA's                              :98804  
#> dimension(s):
#>   from   to   offset delta                       refsys point values    
#> x    1 7360 -1091676   300 PROJCS["unnamed",GEOGCS["... FALSE   NULL [x]
#> y    1 3812 -38556.5  -300 PROJCS["unnamed",GEOGCS["... FALSE   NULL [y]
plot(lf)

Created on 2020-03-20 by the reprex package (v0.3.0)

AD3

# remotes::install_github("r-spatial/stars")
library(stars)
#> Loading required package: abind
#> Loading required package: sf
#> Linking to GEOS 3.7.1, GDAL 2.3.2, PROJ 5.2.0

# data download -----------------------------------------------------------
tf = tempfile(fileext = ".zip")
td = tempdir()
download.file("https://github.com/r-spatial/stars/files/4358677/landcover-colors.zip", tf)
unzip(tf, exdir = td)
lc_path = paste0(td, "/landcover-colors.tif")

# data read ---------------------------------------------------------------
# R crashes ---------------------------------------------------------------
# lc = read_stars(lc_path)
# lc
# plot(lc)
# 
# gdalUtils::gdalinfo(lc_path)

Created on 2020-03-20 by the reprex package (v0.3.0)

edzer commented 4 years ago

Ah yes; we were there: https://github.com/r-spatial/mapview/issues/208 ; This

library(stars)
r = read_stars("ESACCI-LC-L4-LCCS-Map-300m-P1Y-2015-v2.0.7.tif", proxy = TRUE)
plot(r, rgb = read.csv("ESACCI-LC-Legend.csv",sep=";"))

now gives me a plot with the colors, but not with the labels/levels.

Nowosad commented 4 years ago

@edzer I have been thinking about this topic for some time now. Would you consider adding the following changes to accommodate rasters with attribute tables:

  1. A raster with an attribute table is read the same as a regular raster - as an array
  2. The only difference is now it has not one attribute (dim), but also the second attribute (let's call it legend). The second attribute is a data.frame with three columns: value, label, and color. (It could have more columns if it would be better to store colors as RGBA).

This way, it would be easier to work on categorical rasters with values not starting from 1, rasters with different values of NA, etc. I also assume it would make it easier to adopt raster colors in other plotting packages, such as tmap or mapview.

edzer commented 4 years ago

We now have support for attributes levels and colors, and class which is factor. Building on factor adds the constraint that values are 1-based, and drops the requirement to match values to a table. For arrays that start at values larger than 1 the colors and levels vectors can be padded; for those starting lower than 1 there is no chance of getting it done without shifting values. So it is this latter case that your proposal would bring new functionality: having categorical rasters with minimal values below 1 that you want to retain as their original values. ATM I see little urgency to go there.

If someone would take the effort to properly implement (and maintain) your proposal, or something similar, I would be happy to consider a PR.

Nowosad commented 4 years ago

Thanks for the reply @edzer. I will think about more...

Could you take a look at the issues AD2 and AD3 above (https://github.com/r-spatial/stars/issues/245#issuecomment-601609490)?

edzer commented 4 years ago

I'm slowly starting to grasp the concept of all these tables read through GDAL: they always start at 0, 1, 2, ... and min_value does not refer to the data (cell) minimum, but to the first meaningful (non-NA) category in the layer.

I may have nailed it this time - your examples are very helpful!

Nowosad commented 4 years ago

Thank you @edzer - I use categorical rasters a lot, and it would be great to have them work smoothly with stars.

I think the previous example now work fine. However, only when in non-proxy mode:

library(stars)
#> Loading required package: abind
#> Loading required package: sf
#> Linking to GEOS 3.7.1, GDAL 2.3.2, PROJ 5.2.0
# data download -----------------------------------------------------------
tf = tempfile(fileext = ".zip")
td = tempdir()
download.file("https://github.com/r-spatial/stars/files/4345255/lanform-example.zip", tf)
unzip(tf, exdir = td)
lf_path = paste0(td, "/lanform.tif")

# data read ---------------------------------------------------------------
lf = read_stars(lf_path, proxy = TRUE)
#> Error in data + (1 - min_value): non-numeric argument to binary operator
lf
#> Error in eval(expr, envir, enclos): object 'lf' not found

Created on 2020-03-26 by the reprex package (v0.3.0)

edzer commented 4 years ago

Thanks - that was a typo.

Nowosad commented 4 years ago

@edzer conversion from stars to raster works correctly. However, there is an issue with converting raster to stars - colors are lost in the process. This issue has an impact on tmap, as it now uses stars internally to plot raster objects.

library(stars)
#> Loading required package: abind
#> Loading required package: sf
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 7.0.0
library(raster)
#> Loading required package: sp
# data download -----------------------------------------------------------
tf = tempfile(fileext = ".zip")
td = tempdir()
download.file("https://github.com/r-spatial/stars/files/4345255/lanform-example.zip", tf)
unzip(tf, exdir = td)
lf_path = paste0(td, "/lanform.tif")

# data read ---------------------------------------------------------------
# colors are here
lf = read_stars(lf_path)
plot(lf)


# colors are here
lf_r = raster(lf_path)
plot(lf_r)


# colors are here
lf_r2 = as(lf_r, "Raster")
plot(lf_r2)


# colors are not here anymore
lf_s = st_as_stars(lf_r)
plot(lf_s)

Created on 2020-04-24 by the reprex package (v0.3.0)

edzer commented 4 years ago

Given the number of slots in a RasterLayer's @legend slot, there will many more surprises of this kind, in case they are used.

edzer commented 4 years ago

Thanks, should work now!

Nowosad commented 4 years ago

Thank you