rspatial / terra

R package for spatial data handling https://rspatial.github.io/terra/reference/terra-package.html
GNU General Public License v3.0
536 stars 89 forks source link

error reading netCDF from URL #1397

Open AMBarbosa opened 9 months ago

AMBarbosa commented 9 months ago

I'm importing .shp and .tif files from direct download links without a problem, but with .nc files this fails, apparently replacing the first part of the URL with _/vsimem/http1/ (this last number increases by 1 with each download attempt). Here two examples:

ex <- terra::rast("https://www.unidata.ucar.edu/software/netcdf/examples/sresa1b_ncar_ccsm3-example.nc")

Error: [rast] file does not exist: NETCDF:"/vsimem/http_1/sresa1b_ncar_ccsm3-example.nc":ua
In addition: Warning messages:
1: `NETCDF:"/vsimem/http_1/sresa1b_ncar_ccsm3-example.nc":area' does not exist in the file system, and is not recognized as a supported dataset name. (GDAL error 4) 
2: `NETCDF:"/vsimem/http_1/sresa1b_ncar_ccsm3-example.nc":msk_rgn' does not exist in the file system, and is not recognized as a supported dataset name. (GDAL error 4) 
3: `NETCDF:"/vsimem/http_1/sresa1b_ncar_ccsm3-example.nc":pr' does not exist in the file system, and is not recognized as a supported dataset name. (GDAL error 4) 
4: `NETCDF:"/vsimem/http_1/sresa1b_ncar_ccsm3-example.nc":tas' does not exist in the file system, and is not recognized as a supported dataset name. (GDAL error 4) 
5: `NETCDF:"/vsimem/http_1/sresa1b_ncar_ccsm3-example.nc":ua' does not exist in the file system, and is not recognized as a supported dataset name. (GDAL error 4) 

ben <- terra::rast("https://erddap.emodnet.eu/erddap/files/biology_6640_benthos_NorthSea_e4af_0f0e_6a73/04_2021_6640_diva_benthos_erddap.nc")

Error: [rast] file does not exist: NETCDF:"/vsimem/http_2/04_2021_6640_diva_benthos_erddap.nc":taxon_lsid
In addition: Warning messages:
1: `NETCDF:"/vsimem/http_2/04_2021_6640_diva_benthos_erddap.nc":taxon_name' does not exist in the file system, and is not recognized as a supported dataset name. (GDAL error 4) 
2: `NETCDF:"/vsimem/http_2/04_2021_6640_diva_benthos_erddap.nc":taxon_lsid' does not exist in the file system, and is not recognized as a supported dataset name. (GDAL error 4) 

The same files are imported with rast() without a problem if I instead download them manually and read from their folder path.

mdsumner commented 8 months ago

put vsicurl in front , i e.

ben <- terra::rast("/vsicurl/https://erddap.emodnet.eu/erddap/files/biology_6640_benthos_NorthSea_e4af_0f0e_6a73/04_2021_6640_diva_benthos_erddap.nc")

This ensures gdal uses range reading (allows lazy reading) if the source server supports that. In bare form gdal downloads and tries to read from that with the mangled vsimem syntax. I'll post an explanation if I can find it

AMBarbosa commented 8 months ago

I still get the error, though:

ben <- terra::rast("/vsicurl/https://erddap.emodnet.eu/erddap/files/biology_6640_benthos_NorthSea_e4af_0f0e_6a73/04_2021_6640_diva_benthos_erddap.nc")

Error: [rast] file does not exist: /vsicurl/https://erddap.emodnet.eu/erddap/files/biology_6640_benthos_NorthSea_e4af_0f0e_6a73/04_2021_6640_diva_benthos_erddap.nc
In addition: Warning message:
`/vsicurl/https://erddap.emodnet.eu/erddap/files/biology_6640_benthos_NorthSea_e4af_0f0e_6a73/04_2021_6640_diva_benthos_erddap.nc' not recognized as a supported file format. (GDAL error 4) 

sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Linux Mint 21.1

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=pt_PT.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=pt_PT.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=pt_PT.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] terra_1.7-65

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.11       codetools_0.2-19  modEvA_3.11.1    
 [4] digest_0.6.33     later_1.3.1       mime_0.12        
 [7] R6_2.5.1          lifecycle_1.0.4   xtable_1.8-4     
[10] magrittr_2.0.3    rlang_1.1.2       cli_3.6.1        
[13] rstudioapi_0.15.0 promises_1.2.1    ellipsis_0.3.2   
[16] tools_4.1.2       shiny_1.8.0       httpuv_1.6.11    
[19] fastmap_1.1.1     compiler_4.1.2    htmltools_0.5.6.1

Tried it also on Windows and got the same error message.

mdsumner commented 8 months ago

ah well, there's endless fun here and sometimes I find the solutions - I'm seeing mixed results too on different versions

what gdal do you have, what does terra::gdal() report on the linux? it should be 3.7.2 on windows from CRAN.

The other thing to check is your drivers, does NetCDF or HDF5 appear in terra::gdal(drivers = TRUE)$name ? they both do on Windows from CRAN.

AMBarbosa commented 8 months ago

I do have netCDF and HDF5 in terra::gdal(drivers = TRUE)$name. As for terra::gdal(), it's "3.4.1" on Linux, "3.7.2" on Windows.

> version$platform
# on Linux:
[1] "x86_64-pc-linux-gnu"
# on Windows:
[1] "x86_64-w64-mingw32"

> terra::gdal()
# on Linux:
[1] "3.4.1"
# on Windows:
[1] "3.7.2"

> "netCDF" %in% terra::gdal(drivers = TRUE)$name
[1] TRUE
> "HDF5" %in% terra::gdal(drivers = TRUE)$name
[1] TRUE
kadyb commented 8 months ago

FWIW: It seems that it doesn't work in {stars} either (I checked on Windows 10 and Ubuntu 22.04):

url = "https://www.unidata.ucar.edu/software/netcdf/examples/sresa1b_ncar_ccsm3-example.nc"
stars::read_stars(url)
#> area, trying to read file: NETCDF:"/vsimem/http_2/sresa1b_ncar_ccsm3-example.nc":area
#> Error: file not found
stars::read_stars(paste0("/vsicurl/", url))
#> trying to read file: /vsicurl/https://www.unidata.ucar.edu/software/netcdf/examples/sresa1b_ncar_ccsm3-example.nc
#> Error: file not found
dfriend21 commented 8 months ago

Potentially relevant info from the GDAL documentation of the NetCDF driver:

https://gdal.org/drivers/raster/netcdf.html#vsi-virtual-file-system-api-support

edzer commented 8 months ago

I get

> stars::read_stars(paste0("/vsicurl/", url))
area, lat_bnds, lon_bnds, msk_rgn, pr, tas, time_bnds, ua, 
stars object with 2 dimensions and 2 attributes
attribute(s):
                Min.    1st Qu.      Median         Mean     3rd Qu.
area [m^2] 473460608 9556865536 17267403776 1.556105e+10 22449783808
msk_rgn            0          0           0 6.103516e-04           0
                  Max.
area [m^2] 24345513984
msk_rgn              1
dimension(s):
  from  to  offset delta x/y
x    1 256 -0.7031 1.406 [x]
y    1 128   89.63  -1.4 [y]
There were 16 warnings (use warnings() to see them)

and

> stars::read_mdim(paste0("/vsicurl/", url), "tas")
stars object with 3 dimensions and 1 attribute
attribute(s):
            Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
tas [K] 205.2432 269.3327 283.0836 278.6421 296.3472 309.0961
dimension(s):
     from  to  offset delta    refsys point                            values
lon     1 256 -0.7031 1.406        NA    NA                              NULL
lat     1 128      NA    NA        NA    NA [-90,-88.23322),...,[88.23322,90)
time    1   1      NA    NA PCICt_365  TRUE               2001-05-14 12:00:00
     x/y
lon  [x]
lat  [y]
time    
AMBarbosa commented 7 months ago

Now with terra 1.7.71, this mostly works, and no need to manually paste "/vsicurl/" if we use vsi=TRUE:

> ex <- terra::rast("https://www.unidata.ucar.edu/software/netcdf/examples/sresa1b_ncar_ccsm3-example.nc", vsi = TRUE)
> ex

class       : SpatRaster 
dimensions  : 128, 256, 21  (nrow, ncol, nlyr)
resolution  : 1.40625, 1.400437  (x, y)
extent      : -0.703125, 359.2969, -89.62795, 89.62795  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 
sources     : sresa1b_ncar_ccsm3-example.nc:area  
              sresa1b_ncar_ccsm3-example.nc:msk_rgn  
              sresa1b_ncar_ccsm3-example.nc:pr  
              ... and 2 more source(s)
varnames    : area (Surface area) 
              msk_rgn (Mask region) 
              pr (precipitation_flux) 
              ...
names       :   area, msk_rgn,         pr, tas, ua_pl~00000, ua_pl~92500, ... 
unit        : meter2,    bool, kg m-2 s-1,   K,       m s-1,       m s-1, ... 
> url = "http://thredds.northwestknowledge.net:8080/thredds/fileServer/TERRACLIMATE_ALL/data/TerraClimate_ws_2022.nc"
> terra::rast(url, vsi = TRUE, win = c(-10, 0, 35, 45))

class       : SpatRaster 
dimensions  : 240, 240, 12  (nrow, ncol, nlyr)
resolution  : 0.04166667, 0.04166667  (x, y)
window      : -10, 0, 35, 45  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +ellps=WGS84 +no_defs 
source      : TerraClimate_ws_2022.nc 
varname     : ws (wind_speed) 
names       : ws_1, ws_2, ws_3, ws_4, ws_5, ws_6, ... 
unit        :  m/s,  m/s,  m/s,  m/s,  m/s,  m/s, ... 
time (days) : 2022-01-01 to 2022-12-01 

It still seems to fail in some cases, though:

> ben <- terra::rast("https://erddap.emodnet.eu/erddap/files/biology_6640_benthos_NorthSea_e4af_0f0e_6a73/04_2021_6640_diva_benthos_erddap.nc", vsi=TRUE)

Error: [rast] file does not exist: /vsicurl/https://erddap.emodnet.eu/erddap/files/biology_6640_benthos_NorthSea_e4af_0f0e_6a73/04_2021_6640_diva_benthos_erddap.nc
In addition: Warning message:
`/vsicurl/https://erddap.emodnet.eu/erddap/files/biology_6640_benthos_NorthSea_e4af_0f0e_6a73/04_2021_6640_diva_benthos_erddap.nc' not recognized as a supported file format. (GDAL error 4) 
mdsumner commented 7 months ago

the erddap server doesn't support range downloading, and gdal wasn't reporting that as the situation, reported here:

https://lists.osgeo.org/pipermail/gdal-dev/2024-February/058464.html