r-spatial / stars

Spatiotemporal Arrays, Raster and Vector Data Cubes
https://r-spatial.github.io/stars/
Apache License 2.0
554 stars 94 forks source link

Reading local Zarr files into stars #663

Open oshuwilson opened 7 months ago

oshuwilson commented 7 months ago

Hi,

After looking at the vignette for reading Zarr files in stars, I am unsure how to read local Zarr directories into R. I have been trying to work with satellite imagery for the Southern Ocean downloaded from Copernicus' Marine Data Client.

Here is my attempt at coding this

`library(stars)

dsn <- 'ZARR:"sic_daily_samples.zarr/"'

read_mdim(dsn)`

Which gives the error message

Error in CPL_read_mdim(file, array_name, options, offset, count, step, : CHAR() can only be applied to a 'CHARSXP', not a 'NULL' In addition: Warning messages: 1: In CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL Error 1: Decompressor blosc not handled 2: In CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL Error 1: Decompressor blosc not handled 3: In CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL Error 1: Decompressor blosc not handled 4: In CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL Error 1: Decompressor blosc not handled

I've uploaded a subset of the data for ease but I can't figure out how to read it as a zipped or unzipped file, so any help with this would be appreciated!

Thanks, Josh

sic_daily_samples.zarr.zip

edzer commented 7 months ago

I get

> read_mdim("sic_daily_sample.zarr/")
stars object with 3 dimensions and 1 attribute
attribute(s), summary of first 1e+05 cells:
           Min. 1st Qu. Median Mean 3rd Qu. Max.  NA's
siconc [1]   NA      NA     NA  NaN      NA   NA 1e+05
dimension(s):
          from   to  refsys point
longitude    1 4320  WGS 84    NA
latitude     1  961  WGS 84    NA
time         1    1 POSIXct  TRUE
                                                      values x/y
longitude       [-180.0417,-179.9583),...,[179.875,179.9583) [x]
latitude  [-80.04167,-79.95833),...,[-0.04166667,0.04166667) [y]
time                                          2021-01-09 UTC    

What is your sessionInfo() and sf_extSoftVersion() output, after loading stars?

oshuwilson commented 7 months ago

Thanks Edzer, I tried the same code and got the same error message.

My sessionInfo() gives

R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.utf8  LC_CTYPE=English_United Kingdom.utf8   
[3] LC_MONETARY=English_United Kingdom.utf8 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.utf8    

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] stars_0.6-4 sf_1.0-14   abind_1.4-5

loaded via a namespace (and not attached):
 [1] utf8_1.2.4         R6_2.5.1           tidyselect_1.2.0   e1071_1.7-13       magrittr_2.0.3    
 [6] glue_1.6.2         tibble_3.2.1       KernSmooth_2.23-22 parallel_4.3.2     pkgconfig_2.0.3   
[11] generics_0.1.3     dplyr_1.1.3        lifecycle_1.0.4    classInt_0.4-10    cli_3.6.1         
[16] fansi_1.0.5        vctrs_0.6.4        grid_4.3.2         DBI_1.2.1          proxy_0.4-27      
[21] class_7.3-22       compiler_4.3.2     rstudioapi_0.15.0  tools_4.3.2        pillar_1.9.0      
[26] Rcpp_1.0.11        rlang_1.1.2        units_0.8-4       

And my sf_extSoftVersion() prints

   GEOS           GDAL         proj.4 GDAL_with_GEOS     USE_PROJ_H           PROJ 
      "3.11.2"        "3.7.2"        "9.3.0"         "true"         "true"        "9.3.0" 
edzer commented 7 months ago

Please update sf to 1.0-15, and try again.

oshuwilson commented 7 months ago

That still printed the same error message as previously. I haven't yet downloaded the latest version of RStudio but I don't imagine that would cause this error?

edzer commented 7 months ago

See also https://github.com/r-spatial/stars/issues/566#issuecomment-1261880743

oshuwilson commented 7 months ago

Apologies, I'm not yet proficient with R. How do I install that patch? I tried using remotes::install_github("rspatial/sf") but I'm still seeing the same error code.

edzer commented 7 months ago

No need for you to install that patch.

oshuwilson commented 7 months ago

Sorry I'm a bit lost as to what steps I can take from the other issue to fix my issue.

edzer commented 7 months ago

I'm just cross linking them; I can reproduce the error on GitHub actions here: https://github.com/r-spatial/stars/actions/runs/7712573313/job/21020420577#step:6:297

pepijn-devries commented 7 months ago

@oshuwilson,

It seems that this issue is specific to the Windows binary release. Note that you can use CopernicusMarine for subsetting Copernicus Marine data as well. However, it does not yet support ZARR data because of the issue reported here and https://github.com/r-spatial/stars/issues/566#issuecomment-1261880743

oshuwilson commented 7 months ago

Thanks @pepijn-devries - I'll look at doing that to download as a netCDF if the Zarr format remains unusable for my setup. My main issue is that the full data I need is massive (~1.3TB as a netCDF but only ~250GB as Zarr), so Zarr would be preferable if it can work! But if not, I'll get a new hard drive and put my computer to the test.

edzer commented 7 months ago

It seems that this issue is specific to the Windows binary release.

Windows and MacOS binary releases; we added blosc, at least to windows binary builds, but this suggests it's not working.

pepijn-devries commented 6 months ago

Hi @edzer,

Is there any news on the Windows build and blosc decompression of ZARR files? Thanks for your work on the package!

By the way, I did some additional testing. The issue does not only occur on Windows, but also on a Linux Fedora (virtual) machine I have set up:

library(stars)
#> Loading required package: abind
#> Loading required package: sf
#> Linking to GEOS 3.12.1, GDAL 3.7.3, PROJ 9.2.1; sf_use_s2() is TRUE
dsn <- 'ZARR:"/vsicurl/https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/gpcp-feedstock/gpcp.zarr"'
bounds <- c(longitude = "lon_bounds", latitude = "lat_bounds")
r <- read_mdim(dsn, bounds = bounds)
#> Warning in CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL
#> Error 1: Decompressor blosc not handled

#> Warning in CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL
#> Error 1: Decompressor blosc not handled

#> Warning in CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL
#> Error 1: Decompressor blosc not handled

#> Warning in CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL
#> Error 1: Decompressor blosc not handled

#> Warning in CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL
#> Error 1: Decompressor blosc not handled

#> Warning in CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL
#> Error 1: Decompressor blosc not handled

#> Warning in CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL
#> Error 1: Decompressor blosc not handled
#> Error in CPL_read_mdim(file, array_name, options, offset, count, step, : CHAR() can only be applied to a 'CHARSXP', not a 'NULL'

Created on 2024-03-11 with reprex v2.1.0

With sessionInfo():

R version 4.3.2 (2023-10-31)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Fedora Linux 39 (Workstation Edition)

Matrix products: default
BLAS/LAPACK: FlexiBLAS OPENBLAS-OPENMP;  LAPACK version 3.11.0

locale:
 [1] LC_CTYPE=nl_NL.UTF-8       LC_NUMERIC=C               LC_TIME=nl_NL.UTF-8        LC_COLLATE=nl_NL.UTF-8    
 [5] LC_MONETARY=nl_NL.UTF-8    LC_MESSAGES=nl_NL.UTF-8    LC_PAPER=nl_NL.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=nl_NL.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Amsterdam
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] gtable_0.3.4       dplyr_1.1.4        compiler_4.3.2     tidyselect_1.2.0   reprex_2.1.0       Rcpp_1.0.12       
 [7] clipr_0.8.0        callr_3.7.5        scales_1.3.0       yaml_2.3.8         fastmap_1.1.1      ggplot2_3.5.0     
[13] R6_2.5.1           generics_0.1.3     classInt_0.4-10    sf_1.0-15          knitr_1.45         tibble_3.2.1      
[19] units_0.8-5        munsell_0.5.0      DBI_1.2.2          pillar_1.9.0       rlang_1.1.3        utf8_1.2.4        
[25] xfun_0.42          fs_1.6.3           cli_3.6.2          withr_3.0.0        magrittr_2.0.3     ps_1.7.6          
[31] class_7.3-22       processx_3.8.3     digest_0.6.34      grid_4.3.2         rstudioapi_0.15.0  lifecycle_1.0.4   
[37] vctrs_0.6.5        KernSmooth_2.23-22 proxy_0.4-27       evaluate_0.23      glue_1.7.0         fansi_1.0.6       
[43] e1071_1.7-14       colorspace_2.1-0   rmarkdown_2.26     tools_4.3.2        pkgconfig_2.0.3    htmltools_0.5.7   
Artur-man commented 1 month ago

Same here, using MacOS.

library(stars)
> dsn = 'ZARR:"/vsicurl/https://storage.googleapis.com/cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-HR4/highresSST-present/r1i1p1f1/6hrPlev/psl/gn/v20170706"/'
> gdal_utils("info", dsn)
Warning messages:
1: In CPL_gdalinfo(if (missing(source)) character(0) else source, options,  :
  GDAL Error 1: Decompressor blosc not handled
2: In CPL_gdalinfo(if (missing(source)) character(0) else source, options,  :
  GDAL Error 1: Decompressor blosc not handled
3: In CPL_gdalinfo(if (missing(source)) character(0) else source, options,  :
  GDAL Error 1: Decompressor blosc not handled
4: In CPL_gdalinfo(if (missing(source)) character(0) else source, options,  :
  GDAL Error 1: Decompressor blosc not handled
5: In CPL_gdalinfo(if (missing(source)) character(0) else source, options,  :
  GDAL Error 1: Decompressor blosc not handled
6: In CPL_gdalinfo(if (missing(source)) character(0) else source, options,  :
  GDAL Error 1: Decompressor blosc not handled
7: In CPL_gdalinfo(if (missing(source)) character(0) else source, options,  :
  GDAL Error 1: Decompressor blosc not handled

With sessionInfo():

R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.5.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] sf_1.0-16

loaded via a namespace (and not attached):
 [1] compiler_4.3.1     magrittr_2.0.3     class_7.3-22       DBI_1.2.3          tools_4.3.1        units_0.8-5        proxy_0.4-27       rstudioapi_0.16.0  Rcpp_1.0.13        KernSmooth_2.23-24 grid_4.3.1         e1071_1.7-14       classInt_0.4-10