r-spatial / stars

Spatiotemporal Arrays, Raster and Vector Data Cubes
https://r-spatial.github.io/stars/
Apache License 2.0
558 stars 94 forks source link

Reading 'stars' with color table shifted values #329

Closed michaeldorman closed 3 years ago

michaeldorman commented 3 years ago

I’m working with a the NLCD land cover raster that has a color table. The minimal category value is 11 (water bodies, in blue below). When reading the raster to a stars object, the values start from 1 rather than 11, so that all values are shifted by 10:

library(stars)
## Loading required package: abind

## Loading required package: sf

## Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 7.0.0
# Read
url = "https://www.dropbox.com/s/mzfh3tkjx8svgnp/nlcd_2011_landcover_2011_edition_2014_10_10_boston.tif?dl=1"
path = tempfile(fileext = ".tif")
download.file(url, path)
r = read_stars(path)
r = droplevels(r)

# Unique values
sort(unique(r[[1]]))
##  [1] 1  11 12 13 14 21 31 32 33 42 61 71 72 80 85
## Levels: 1 11 12 13 14 21 31 32 33 42 61 71 72 80 85
# Plot
plot(r)

unnamed-chunk-1-1

I think this issue was addressed in #245. Specifically, I noticed @edzer suggested:

offer users a choice: either work with factors and accept a possibly different numerical representation (warn in that case?), or work with the numerical representation and ignore the labels

which could be very helpful in this case. Will appreciate any suggestion on how to obtain the original values. Please feel free to close if this is already addressed or in progress.

Comparison to raster, where the values correctly start from 11:

library(raster)
## Loading required package: sp
s = raster(path)
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded datum Unknown based on GRS80 ellipsoid in CRS definition,
##  but +towgs84= values preserved
sort(unique(s[]))
##  [1] 11 21 22 23 24 31 41 42 43 52 71 81 82 90 95

Also in QGIS, using “Inspect”, the water bodies category value is 11:

Screenshot from 2020-09-30 13-33-12


Related issue, suppose that I want to convert the stars raster r to numeric and manually add 10 to all values to fix the problem (although realize this is generally not recommended). Since the underlying matrix is a factor, the only way to do that I could find is:

r[[1]] = matrix(as.numeric(as.character(r[[1]][])), nrow(r[[1]]), ncol(r[[1]]))
sort(unique(as.vector(r[[1]])))
##  [1]  1 11 12 13 14 21 31 32 33 42 61 71 72 80 85

Afterwards, I can add 10 to all cell values as follows:

r = r + 10
sort(unique(as.vector(r[[1]])))
##  [1] 11 21 22 23 24 31 41 42 43 52 71 81 82 90 95
plot(r)

unnamed-chunk-4-1

Is there any other/standard way to make the conversion from factor to numeric attribute in stars?

edzer commented 3 years ago

Good to revisit this. This might fix it, as plot(r) now gives

x

I now see:

> table(as.numeric(r[[1]]))

       1        2        3        4        5        6        7        8 
10100916  2490735  2314953  2050059   681687   480323  8430835  3409636 
       9       10       11       12       13       14       15 
 2512260   482075   299249  1367447   267769  3329146   769891 
> table(as.numeric(as.character(r[[1]])))

      11       21       22       23       24       31       41       42 
10100916  2490735  2314953  2050059   681687   480323  8430835  3409636 
      43       52       71       81       82       90       95 
 2512260   482075   299249  1367447   267769  3329146   769891 

which is a cumbersome way to get to the numbers inside the levels, but categorical data are not supposed to refer to numbers anyway, so I don't see this as a big issue.

@Nowosad I did run through the examples @michaeldorman pointed to (also those in https://github.com/mtennekes/tmap/issues/368#issuecomment-595727166) and things look still good when we have a RAT, but it would be great if you could give this a second look.

michaeldorman commented 3 years ago

Works for me, thanks!

Screenshot from 2020-10-04 23-31-47

Nowosad commented 3 years ago

It seems to work fine!

# remotes::install_github("r-spatial/stars")
library(stars)
#> Loading required package: abind
#> Loading required package: sf
#> Linking to GEOS 3.8.1, GDAL 3.0.4, PROJ 6.3.2

# 1
tf = tempfile(fileext = ".tif")
download.file("https://s3-us-west-2.amazonaws.com/mrlc/PR_landcover_wimperv_10-28-08_se5.zip", tf)
dir.create("tmp")
unzip(tf, exdir = "tmp")

tf = read_stars("tmp/pr_landcover_wimperv_10-28-08_se5.img")
tf
#> stars object with 2 dimensions and 1 attribute
#> attribute(s), summary of first 1e+05 cells:
#>  pr_landcover_wimperv_10-28-08_se5.img 
#>         :100000                        
#>         :     0                        
#>         :     0                        
#>         :     0                        
#>         :     0                        
#>         :     0                        
#>  (Other):     0                        
#> dimension(s):
#>   from   to  offset delta                    refsys point values x/y
#> x    1 8427 3092415    30 Albers Conical Equal Area    NA   NULL [x]
#> y    1 4613   59415   -30 Albers Conical Equal Area    NA   NULL [y]

plot(tf)


# 2
tf = tempfile(fileext = ".zip")
td = tempdir()
download.file("https://github.com/r-spatial/stars/files/4345255/lanform-example.zip", tf)
unzip(tf, exdir = td)
lf_path = paste0(td, "/lanform.tif")

lf = read_stars(lf_path)
lf
#> stars object with 2 dimensions and 1 attribute
#> attribute(s), summary of first 1e+05 cells:
#>                       lanform.tif   
#>  10.Low mountains           :  369  
#>  11.Scattered high mountains:  306  
#>  9. Scattered low mountains :  273  
#>  12.High mountains          :  240  
#>  17. Surface water          :    8  
#>  (Other)                    :    0  
#>  NA's                       :98804  
#> dimension(s):
#>   from   to   offset delta  refsys point values x/y
#> x    1 7360 -1091676   300 unnamed FALSE   NULL [x]
#> y    1 3812 -38556.5  -300 unnamed FALSE   NULL [y]

plot(lf)

Created on 2020-10-07 by the reprex package (v0.3.0)