r-spatial / stars

Spatiotemporal Arrays, Raster and Vector Data Cubes
https://r-spatial.github.io/stars/
Apache License 2.0
561 stars 94 forks source link

Errors with RasterIO #712

Open agila5 opened 2 weeks ago

agila5 commented 2 weeks ago

Dear all, I've created this issue since I have some questions/doubts regarding the behaviour of RasterIO. For example:

library(stars)
#> Loading required package: abind
#> Loading required package: sf
#> Linking to GEOS 3.11.2, GDAL 3.7.2, PROJ 9.3.0; sf_use_s2() is TRUE

W <- st_bbox(c(
  xmin = 11.9264894188302, ymin = 35.4930391676817,
  xmax = 15.6509648037934, ymax = 38.8119746624118
), crs = "OGC:CRS84")

(dt <- read_stars(
  .x = "C:/Users/user/OneDrive - Politecnico di Milano/data-NDVI/c_gls_NDVI300_202301010000_GLOBE_OLCI_V2.0.1.nc", 
  sub = "NDVI"
))
#> NDVI,
#> stars_proxy object with 1 attribute in 1 file(s):
#> $NDVI
#> [1] "[...]/c_gls_NDVI300_202301010000_GLOBE_OLCI_V2.0.1.nc:NDVI"
#> 
#> dimension(s):
#>      from     to         offset     delta  refsys x/y
#> x       1 120960           -180  0.002976  WGS 84 [x]
#> y       1  47040             80 -0.002976  WGS 84 [y]
#> time    1      1 2023-01-01 UTC        NA POSIXct

Now, my understanding is that the input data (i.e. the .nc file) is an array-like object where the first two dimensions (i.e. x and y) have 120960 and 47040 elements, respectively, and we have only 1 time stamp. If I try to subset the region of interest, I get the following.

dt[W]
#> stars_proxy object with 1 attribute in 1 file(s):
#> $NDVI
#> [1] "[...]/c_gls_NDVI300_202301010000_GLOBE_OLCI_V2.0.1.nc:NDVI"
#> 
#> dimension(s):
#>       from    to         offset     delta  refsys x/y
#> x    64488 65740           -180  0.002976  WGS 84 [x]
#> y    13840 14955             80 -0.002976  WGS 84 [y]
#> time     1     1 2023-01-01 UTC        NA POSIXct

Now, based on my understanding of the second vignette included in the package, I started trying to replicate a similar spatial filter using the RasterIO options and got the following error:

read_stars(
  .x = "C:/Users/user/OneDrive - Politecnico di Milano/data-NDVI/c_gls_NDVI300_202301010000_GLOBE_OLCI_V2.0.1.nc", 
  sub = "NDVI", 
  RasterIO = list(
    nXOff = 64488,
    nYOff = 13840
  )
)
#> Error in CPL_read_gdal(as.character(x), as.character(options), as.character(driver), : the dims contain negative values

Furthermore, when I specify also the size of the new objects that I would like to read, I get a different error

read_stars(
  .x = "C:/Users/user/OneDrive - Politecnico di Milano/data-NDVI/c_gls_NDVI300_202301010000_GLOBE_OLCI_V2.0.1.nc", 
  sub = "NDVI", 
  RasterIO = list(
    nXOff = 64488, 
    nXSize = 1253, 
    nYOff = 13840, 
    nYSize = 1116
  )
)
#> Warning in CPL_read_gdal(as.character(x), as.character(options),
#> as.character(driver), : GDAL Error 5: C:\Users\user\OneDrive - Politecnico di
#> Milano\data-NDVI\c_gls_NDVI300_202301010000_GLOBE_OLCI_V2.0.1.nc: Access window
#> out of range in RasterIO().  Requested (64487,13839) of size 1253x1116 on
#> raster of 512x512.
#> Error in eval(expr, envir, enclos): read failure

I’m even more confused now since I don’t understand why it says my raster is 512 x 512…

The .nc file used in this reprex is related to the NDVI data shared by CLMS: https://land.copernicus.eu/en/products/vegetation/normalised-difference-vegetation-index-v2-0-300m#download.

The file can be downloaded from the Provider’s manifest of Copernicus Land Monitoring Service at the following link: https://globalland.vito.be/download/manifest/ndvi_300m_v2_10daily_netcdf/manifest_clms_global_ndvi_300m_v2_10daily_netcdf_latest.txt

It corresponds to the first set of observations recorded during 2023 and the following is the direct link: https://globalland.vito.be/download/netcdf/ndvi/ndvi_300m_v2_10daily/2023/20230101/c_gls_NDVI300_202301010000_GLOBE_OLCI_V2.0.1.nc

Warning: each file listed in that txt file is approximately 1.5/2GB.

I’m sorry for the “hard-to-reproduce” bug, but I couldn’t replicate it with artificial data.

Created on 2024-10-05 with reprex v2.0.2

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.1 (2023-06-16 ucrt) #> os Windows 11 x64 (build 22631) #> system x86_64, mingw32 #> ui RTerm #> language (EN) #> collate English_United Kingdom.utf8 #> ctype English_United Kingdom.utf8 #> tz Europe/Rome #> date 2024-10-05 #> pandoc 3.2 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> abind * 1.4-8 2024-09-12 [1] CRAN (R 4.3.1) #> class 7.3-22 2023-05-03 [2] CRAN (R 4.3.1) #> classInt 0.4-10 2023-09-05 [1] CRAN (R 4.3.1) #> cli 3.6.3 2024-06-21 [1] CRAN (R 4.3.1) #> DBI 1.2.3 2024-06-02 [1] CRAN (R 4.3.1) #> digest 0.6.35 2024-03-11 [1] CRAN (R 4.3.3) #> e1071 1.7-16 2024-09-16 [1] CRAN (R 4.3.1) #> evaluate 0.24.0 2024-06-10 [1] CRAN (R 4.3.3) #> fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.3.3) #> fs 1.6.4 2024-04-25 [1] CRAN (R 4.3.3) #> glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.2) #> htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.3.3) #> KernSmooth 2.23-21 2023-05-03 [2] CRAN (R 4.3.1) #> knitr 1.48 2024-07-07 [1] CRAN (R 4.3.3) #> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.2) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.1) #> proxy 0.4-27 2022-06-09 [1] CRAN (R 4.3.1) #> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.1) #> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.3.1) #> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.3.1) #> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.3.1) #> R.utils 2.12.2 2022-11-11 [1] CRAN (R 4.3.1) #> Rcpp 1.0.13 2024-07-17 [1] CRAN (R 4.3.1) #> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.3.1) #> rlang 1.1.4 2024-06-04 [1] CRAN (R 4.3.1) #> rmarkdown 2.27 2024-05-17 [1] CRAN (R 4.3.3) #> rstudioapi 0.16.0 2024-03-24 [1] CRAN (R 4.3.1) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.1) #> sf * 1.0-18 2024-10-04 [1] Github (r-spatial/sf@6f247a5) #> stars * 0.6-7 2024-10-05 [1] Github (r-spatial/stars@ec1f849) #> styler 1.10.2 2023-08-29 [1] CRAN (R 4.3.1) #> units 0.8-5.4 2024-06-03 [1] local #> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.2) #> withr 3.0.1 2024-07-31 [1] CRAN (R 4.3.3) #> xfun 0.45 2024-06-16 [1] CRAN (R 4.3.3) #> yaml 2.3.9 2024-07-05 [1] CRAN (R 4.3.3) #> #> [1] C:/Users/user/AppData/Local/R/win-library/4.3 #> [2] C:/Program Files/R/R-4.3.1/library #> #> ────────────────────────────────────────────────────────────────────────────── ```

EDIT: I just noticed that a similar issue was already discussed in https://github.com/r-spatial/stars/issues/678 but I still get an error even if I'm using the github version of the package. Sorry, but I didn't find that issue sooner. EDIT2: I also run some tests with read_ncdf but with, IMO, it also returns a confusing output (not necessarily related to the R package since it just complains about the CRS). I can add them here, but they probably deserve a separate issue so I don't mix too many things.

edzer commented 2 weeks ago

Thanks for the clear example, yes this needs some looking into; the first two error messages are not really helpful (or should not occur). The last example does work when you specify the sub-dataset directly the way GDAL understands it (and gdalinfo would reveal):

read_stars(
  .x = 'NETCDF:"c_gls_NDVI300_202007010000_GLOBE_OLCI_V2.0.1.nc":NDVI',
  # sub = "NDVI",
  RasterIO = list(
    nXOff = 64488,
    nXSize = 1253,
    nYOff = 13840,
    nYSize = 1116
  )
)
# stars object with 3 dimensions and 1 attribute
# attribute(s), summary of first 1e+05 cells:
#        Min. 1st Qu. Median      Mean 3rd Qu. Max.  NA's
# NDVI  -0.08   0.315  0.388 0.3909759   0.505 0.84 99668
# dimension(s):
#       from    to         offset     delta  refsys x/y
# x    64488 65740           -180  0.002976  WGS 84 [x]
# y    13840 14955             80 -0.002976  WGS 84 [y]
# time     1     1 2020-07-01 UTC        NA POSIXct