r-spatial / stars

Spatiotemporal Arrays, Raster and Vector Data Cubes
https://r-spatial.github.io/stars/
Apache License 2.0
561 stars 94 forks source link

attributes from an RGBtif not read with `read_stars` #388

Closed GillesSanMartin closed 3 years ago

GillesSanMartin commented 3 years ago

I’m not sure if this is a bug or just me misusing read_stars … (probably the latter).
When I read the following tif file (62 Mo) with read_stars the attributes are empty. Why ? https://drive.google.com/file/d/1vIP0Sl6JDTfgGq\_MpTQl4wZYVCM9C14q/view?usp=sharing

It’s a weird RGB image that I want to convert to a spatial raster

setwd("path")

library(stars)
#> Le chargement a nécessité le package : abind
#> Le chargement a nécessité le package : sf
#> Linking to GEOS 3.8.0, GDAL 3.0.2, PROJ 6.2.1
library(raster)
#> Le chargement a nécessité le package : sp

If I read the tif file with stars, I obtain an object with 4 bands but the attribute looks empty

r <- read_stars("sapins_from_raster.tif")
r
#> stars_proxy object with 1 attribute in file:
#> $sapins_from_raster.tif
#> [1] "sapins_from_raster.tif"
#> 
#> dimension(s):
#>      from    to offset delta                       refsys point values x/y
#> x       1 28000  30000    10 Belge 1972 / Belgian Lamb... FALSE   NULL [x]
#> y       1 18500  2e+05   -10 Belge 1972 / Belgian Lamb... FALSE   NULL [y]
#> band    1     4     NA    NA                           NA    NA   NULL

If I read the same file with raster I obtain the expected object with RGB values (+ a transparency band)

r <- stack("sapins_from_raster.tif")
#> Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
#> datum Unknown_based_on_International_1909_Hayford_ellipsoid in CRS definition
crs(r) <- CRS("+init=epsg:31370")
#> Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
#> datum Reseau_National_Belge_1972 in CRS definition
r
#> class      : RasterStack 
#> dimensions : 18500, 28000, 5.18e+08, 4  (nrow, ncol, ncell, nlayers)
#> resolution : 10, 10  (x, y)
#> extent     : 30000, 310000, 15000, 2e+05  (xmin, xmax, ymin, ymax)
#> crs        : +proj=lcc +lat_0=90 +lon_0=4.36748666666667 +lat_1=51.1666672333333 +lat_2=49.8333339 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +units=m +no_defs 
#> names      : sapins_from_raster.1, sapins_from_raster.2, sapins_from_raster.3, sapins_from_raster.4 
#> min values :                  119,                  136,                   43,                  255 
#> max values :                  255,                  255,                  255,                  255

Curiously, if I crop this result, save it on the disk and then read this cropped tif with stars then I can see the attributes ?? !!

tmp <- crop(r, extent(194000, 200000, 63000, 68000))
#> Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
#> datum Unknown_based_on_International_1909_Hayford_ellipsoid in CRS definition
writeRaster(tmp, "sapins_from_raster_crop.tif", overwrite=TRUE)
#> Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
#> datum Unknown_based_on_International_1909_Hayford_ellipsoid in CRS definition
r <- read_stars("sapins_from_raster_crop.tif")
r
#> stars object with 3 dimensions and 1 attribute
#> attribute(s), summary of first 1e+05 cells:
#>  sapins_from_raster_crop.tif 
#>  Min.   :132.0               
#>  1st Qu.:255.0               
#>  Median :255.0               
#>  Mean   :251.1               
#>  3rd Qu.:255.0               
#>  Max.   :255.0               
#> dimension(s):
#>      from  to offset delta                       refsys point values x/y
#> x       1 600 194000    10 +proj=lcc +lat_0=90 +lon_... FALSE   NULL [x]
#> y       1 500  68000   -10 +proj=lcc +lat_0=90 +lon_... FALSE   NULL [y]
#> band    1   4     NA    NA                           NA    NA   NULL

Packages used : raster_3.3-13 stars_0.5-1 sf_0.9-7

edzer commented 3 years ago

If I read the tif file with stars, I obtain an object with 4 bands but the attribute looks empty

The attributes look empty because it is a stars_proxy object: data are not (yet) read into memory because the raster is too large. But try to plot it with plot(r).

GillesSanMartin commented 3 years ago

OK thanks a lot! I didn’t realize that. This explains the differences between the cropped image and the full image

However when I plot the full (stars_proxy) object the maps are entirely gray as if there were no data :

setwd("path")
library(stars)
#> Le chargement a nécessité le package : abind
#> Le chargement a nécessité le package : sf
#> Linking to GEOS 3.8.0, GDAL 3.0.2, PROJ 6.2.1
library(raster)
#> Le chargement a nécessité le package : sp
library(dplyr)
#> 
#> Attachement du package : 'dplyr'
#> The following objects are masked from 'package:raster':
#> 
#>     intersect, select, union
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

# Graph with raster (zoomed in, in an area of intereset)
r <- stack("sapins_from_raster.tif")
#> Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
#> datum Unknown_based_on_International_1909_Hayford_ellipsoid in CRS definition
plot(r, xlim = c(194000, 200000), ylim = c( 63000, 68000))


# Graph with stars : 
r <- read_stars("sapins_from_raster.tif")
plot(r, xlim = c(194000, 200000), ylim = c( 63000, 68000))

Also when I do my computations on the cropped image I managed to obtain what I want (a single band raster with 1 for presence and NA for absences) but it does not work on the full dataset. This might be partially due to syntaxic differences between stars_proxy objects and stars objects (problems with dplyr verbs ?). But I have the impression that something else is wrong because my plots are wrong even before applying tidyr verbs.

Computations on the cropped data :

# prepare a cropped version of the file

r <- stack("sapins_from_raster.tif")
#> Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
#> datum Unknown_based_on_International_1909_Hayford_ellipsoid in CRS definition
tmp <- crop(r, extent(194000, 200000, 63000, 68000))
writeRaster(tmp, "sapins_from_raster_crop.tif", overwrite=TRUE)
#> Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
#> datum Unknown_based_on_International_1909_Hayford_ellipsoid in CRS definition
r <- read_stars("sapins_from_raster_crop.tif")

# Sum the values of the 3 chanels and then ignore (transform to NA) 
# all the pixels that are white or very light
names(r) <- "raw_value"
r <- r[,,,1:3] # drop 4th canal (transparency)
tmp <- st_apply(r, MARGIN = c("x", "y"), FUN = sum, rename = F) # sum the 3 RGB bands values
# tmp <- tmp %>% mutate(sapin = c(1,NA)[as.numeric(raw_value>230*3)+1]) %>% select(sapin) # other approach
tmp <- tmp %>% mutate(sapin = ifelse(raw_value>230*3, NA, 1)) # %>% select(sapin)
tmp <- tmp["sapin"]

plot(tmp)

Same computations on the full data.

r <- read_stars("sapins_from_raster.tif")

# Sum the values of the 3 chanels and then ignore (transform to NA) 
# all the pxels that are white or very light
names(r) <- "raw_value"
r <- r[,,,1:3] # drop 4th canal (transparency)
tmp <- st_apply(r, MARGIN = c("x", "y"), FUN = sum, rename = F) # sum the 3 RGB bands values
tmp <- tmp %>% mutate(sapin = ifelse(raw_value>230*3, NA, 1)) # %>% select(sapin)
tmp <- tmp["sapin"]

plot(tmp)
#> Error in enc2utf8(maybe_normalizePath(.x, np = normalize_path)): argument is not a character vector

Created on 2021-02-13 by the reprex package (v0.3.0)

edzer commented 3 years ago

The link where you uploaded the data no longer works, could you please update?

GillesSanMartin commented 3 years ago

Strange... Here are 2 new links :

https://drive.google.com/file/d/1vIP0Sl6JDTfgGq_MpTQl4wZYVCM9C14q/view?usp=sharing

https://we.tl/t-c6fgDMAI2n (valid 7 days only)

edzer commented 3 years ago
r = read_stars("sapins_from_raster.tif")
bb = st_bbox(c(xmin = 194000, xmax=200000, ymin=63000, ymax=68000), crs = st_crs(r))
plot(st_crop(r, bb, breaks = "equal")

x

Using this fix,

library(stars)
library(dplyr)
r <- read_stars("sapins_from_raster.tif")

names(r) <- "raw_value"
r <- r[,,,1:3] # drop 4th canal (transparency)
tmp <- st_apply(r, MARGIN = c("x", "y"), FUN = sum, rename = FALSE) # sum the 3 RGB bands values
tmp <- tmp %>% mutate(sapin = ifelse(raw_value>230*3, NA, 1)) %>% select(sapin)
bb = st_bbox(c(xmin = 194000, xmax=200000, ymin=63000, ymax=68000), crs = st_crs(r))
plot(st_crop(tmp, bb), breaks = "equal")

x

GillesSanMartin commented 3 years ago

Thanks a lot for your answers and your time…

Strangely, I still have problems with your code snippets… So I guess something is wrong with my config ? I use the latest stars and sf versions from CRAN. But my R, Linux distro, and also maybe GDAL version are a bit outdated. Could that be the cause ?? Sadly I cannot upgrade for the moment. Or maybe it is caused by not up-to-date dependency package ??

Version used (full sessionInfo at the end): stars_0.5-1 sf_0.9-7 R version 3.6.2 Ubuntu 18.04.4 LTS GEOS 3.8.0, GDAL 3.0.2, PROJ 6.2.1

The graphs are still gray even after true cropping and using breaks = "equal"

setwd("~/Bureau/")
Sys.setlocale("LC_MESSAGES", "C") # to have the error/warning messages in English
#> [1] "C"
library(stars)
#> Loading required package: abind
#> Loading required package: sf
#> Linking to GEOS 3.8.0, GDAL 3.0.2, PROJ 6.2.1
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

r = read_stars("sapins_from_raster.tif")
bb = st_bbox(c(xmin = 194000, xmax=200000, ymin=63000, ymax=68000), crs = st_crs(r))
plot(st_crop(r, bb, breaks = "equal"))

Several errors and warning messages in the next code snippet

setwd("~/Bureau/")
r <- read_stars("sapins_from_raster.tif")

names(r) <- "raw_value"
r <- r[,,,1:3] # drop 4th canal (transparency)
tmp <- st_apply(r, MARGIN = c("x", "y"), FUN = sum, rename = FALSE) # sum the 3 RGB bands values
tmp <- tmp %>% mutate(sapin = ifelse(raw_value>230*3, NA, 1)) %>% select(sapin)
bb = st_bbox(c(xmin = 194000, xmax=200000, ymin=63000, ymax=68000), crs = st_crs(r))
plot(st_crop(tmp, bb), breaks = "equal")
#> Warning in min(x): no non-missing arguments to min; returning Inf
#> Warning in max(x): no non-missing arguments to max; returning -Inf
#> Warning in min(x, na.rm = na.rm): no non-missing arguments to min; returning Inf
#> Warning in max(x, na.rm = na.rm): no non-missing arguments to max; returning -
#> Inf
#> Error in image.default(dims[[dimx]], dims[[dimy]], ar, asp = asp, xlab = xlab, : 'breaks' must all be finite

This seems to come from the ifelse statement. If I remove the %>% select(sapin) part I have no error message but I obtain a wrong plot showing only “raw_value” as if the mutate statement silently didn’t work.

setwd("~/Bureau/")
r <- read_stars("sapins_from_raster.tif")

names(r) <- "raw_value"
r <- r[,,,1:3] # drop 4th canal (transparency)
tmp <- st_apply(r, MARGIN = c("x", "y"), FUN = sum, rename = FALSE) # sum the 3 RGB bands values
tmp <- tmp %>% mutate(sapin = ifelse(raw_value>230*3, NA, 1))
bb = st_bbox(c(xmin = 194000, xmax=200000, ymin=63000, ymax=68000), crs = st_crs(r))
plot(st_crop(tmp, bb), breaks = "equal")

Previously, I used the tmp["sapin"] syntax instead of %>% select(sapin). This induces another cryptic error message about enc2utf8(maybe_normalizePath(.x, np = normalize_path)). Again I think this is due to the fact that for any reason, the mutate/ifelse statement didn’t work so there is no “sapin” attribute to select (?). I have included several plots to show at each step what the data looks like.

setwd("~/Bureau/")
r <- read_stars("sapins_from_raster.tif")
bb = st_bbox(c(xmin = 194000, xmax=200000, ymin=63000, ymax=68000), crs = st_crs(r))

names(r) <- "raw_value"
r <- r[,,,1:3] # drop 4th canal (transparency)
plot(st_crop(r, bb), breaks = "equal")

tmp <- st_apply(r, MARGIN = c("x", "y"), FUN = sum, rename = FALSE) # sum the 3 RGB bands values
tmp <- tmp %>% mutate(sapin = ifelse(raw_value>230*3, NA, 1))
plot(st_crop(tmp, bb), breaks = "equal")

tmp <- tmp["sapin"]
plot(st_crop(tmp, bb), breaks = "equal")
#> Error in enc2utf8(maybe_normalizePath(.x, np = normalize_path)): argument is not a character vector

Full session info:

sessionInfo()
#> R version 3.6.2 (2019-12-12)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 18.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
#> 
#> locale:
#>  [1] LC_CTYPE=fr_BE.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=fr_BE.UTF-8        LC_COLLATE=fr_BE.UTF-8    
#>  [5] LC_MONETARY=fr_BE.UTF-8    LC_MESSAGES=C             
#>  [7] LC_PAPER=fr_BE.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=fr_BE.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] dplyr_1.0.2 stars_0.5-1 sf_0.9-7    abind_1.4-5
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.5         compiler_3.6.2     pillar_1.4.6       highr_0.8         
#>  [5] class_7.3-15       tools_3.6.2        digest_0.6.27      evaluate_0.14     
#>  [9] lifecycle_0.2.0    tibble_3.0.4       pkgconfig_2.0.3    rlang_0.4.8       
#> [13] DBI_1.1.0          curl_4.3           yaml_2.2.1         parallel_3.6.2    
#> [17] xfun_0.18          e1071_1.7-3        xml2_1.3.2         stringr_1.4.0     
#> [21] httr_1.4.2         knitr_1.30         generics_0.0.2     vctrs_0.3.4       
#> [25] classInt_0.4-3     grid_3.6.2         tidyselect_1.1.0   glue_1.4.2        
#> [29] R6_2.5.0           rmarkdown_2.5      purrr_0.3.4        magrittr_1.5      
#> [33] htmltools_0.5.0    ellipsis_0.3.1     units_0.6-7        mime_0.9          
#> [37] KernSmooth_2.23-16 stringi_1.5.3      lwgeom_0.2-5       crayon_1.3.4

Created on 2021-02-14 by the reprex package (v0.3.0)

edzer commented 3 years ago

There was a fix mentioned above, you'd need to use that:

remotes::install_github("r-spatial/stars")
GillesSanMartin commented 3 years ago

OK, sorry, I completely missed that...

Now with the last github version, everything works fine with one exception : the %>% select(sapin) syntax works while tmp["sapin"] still provides the same error message (while this works on the stars object):

Error in enc2utf8(maybe_normalizePath(.x, np = normalize_path)) : 
  argument is not a character vector

Reproducible example :

r <- read_stars("sapins_from_raster.tif")
bb = st_bbox(c(xmin = 194000, xmax=200000, ymin=63000, ymax=68000), crs = st_crs(r))

names(r) <- "raw_value"
r <- r[,,,1:3] # drop 4th canal (transparency)
tmp <- st_apply(r, MARGIN = c("x", "y"), FUN = sum, rename = FALSE) # sum the 3 RGB bands values
tmp <- tmp %>% mutate(sapin = ifelse(raw_value>230*3, NA, 1))
tmp <- tmp["sapin"]
plot(st_crop(tmp, bb), breaks = "equal")

Also for the record, there was a syntaxic error (misplaced parenthesis) in my first code chunk : I wrote :

plot(st_crop(r, bb, breaks = "equal"))

instead of :

plot(st_crop(r, bb), breaks = "equal")
edzer commented 3 years ago

This should now also work.

GillesSanMartin commented 3 years ago

Yes everything works now. Thanks a lot !!