mikejohnson51 / climateR

An R 📦 for getting point and gridded climate data by AOI
https://mikejohnson51.github.io/climateR/
MIT License
170 stars 40 forks source link

Trouble Downloading over large area with SpatVect AOI #93

Closed mfertakos closed 6 months ago

mfertakos commented 6 months ago

Hello. I am attempting to download terra climate normals over a SpatVector area covering the eastern United States, but when I run the following code the getTerraClimateNormals() function runs for hours until R crashes.

`library(blockCV) library(terra) library(sf) library(dplyr) library(ggplot2)

blocks<-st_transform(x=sb1$blocks,crs="+proj=longlat +ellps=WGS84 +no_defs") #transform CRS from Albers Equal Area to WGS84 blocks<-vect(blocks)

library(climateR) library(AOI) cdat = getTerraClimNormals(AOI=blocks, varname="tmin", scenario="19812010", month=5:9, verbose=TRUE)`

The object 'blocks' has the following structure: class : SpatVector geometry : polygons dimensions : 82, 2 (geometries, attributes) extent : -98.90219, -65.21105, 28.12502, 50.43962 (xmin, xmax, ymin, ymax) coord. ref. : +proj=longlat +ellps=WGS84 +no_defs names : block_id folds type : values : 1 39 2 57 3 76

I have also tried aggregating the multipolygon SpatVector (blocks) into a single polygon, and have tried using the original projected coordinate system (AEE) instead of converting it to WGS to no avail. Does a request of this size just take several hours, or am I doing something wrong? Let me know if I can supply any additional information, and thanks for your help in advance!

mikejohnson51 commented 6 months ago

Hi @mfertakos! Thanks for the note, it does seem there is a large slow down in this call when passing a SpatVect object vs and sf object. I will fix this soon, but, for now you can convert your input to sf and get pretty fast results (~8 seconds (edited when I got off a hotspot)). For example:

blocks = terra::ext(c(-98.90219, -65.21105, 28.12502, 50.43962)) |>
  vect(crs = '+proj=longlat +ellps=WGS84 +no_defs') |> 
  sf::st_as_sf()

system.time({
  cdat = climateR::getTerraClimNormals(AOI=blocks, 
                                       varname="tmin", 
                                       scenario="19812010", 
                                       month=5:9)
})

user  system elapsed 
0.516   0.367   8.547 

I'll return with more once I figure out the bottleneck! Thanks again!

mfertakos commented 6 months ago

Hi again,

Thanks a lot for this info! It seems even as an sf the function is still running for a long time. I ran it for about 30 minutes with no success. My internet speeds seem fine though, so I am not sure what is still causing the bottleneck. Any ideas?

Edit: I tried this code on another machine with a different version of R (4.3.3) and it worked in a few seconds. It must be some kind of dependencies issue on this machine.

mikejohnson51 commented 6 months ago

With the last commit, SpatVect objects work as expected again:

library(climateR); library(terra)

system.time({
    cdat = terra::ext(c(-98.90219, -65.21105, 28.12502, 50.43962)) |>
      vect(crs = '+proj=longlat +ellps=WGS84 +no_defs') |>
      getTerraClimNormals(
        varname = "tmin",
        scenario = "19812010",
        month = 5:9)
})

#>    user  system elapsed 
#>   0.937   0.346   8.221

cdat

#> $tmin
#> class       : SpatRaster 
#> dimensions  : 566, 809, 5  (nrow, ncol, nlyr)
#> resolution  : 0.04166667, 0.04166667  (x, y)
#> extent      : -98.91667, -65.20833, 28.125, 51.70833  (xmin, xmax, ymin, ymax)
#> coord. ref. : +proj=longlat +ellps=WGS84 +no_defs 
#> source(s)   : memory
#> names       : tmin_1~812010, tmin_1~812010, tmin_1~812010, tmin_1~812010, tmin_1~812010 
#> min values  :          -6.0,          -6.5,          -6.1,          -5.7,          -4.1 
#> max values  :          22.8,          26.0,          27.1,          27.2,          25.1 
#> unit        :         deg C,         deg C,         deg C,         deg C,         deg C 
#> time        : 1961-05-01 to 1961-09-01 UTC

Created on 2024-03-17 with reprex v2.0.2

mikejohnson51 commented 6 months ago

Ok, so the package is doing what I expect. To see what might be going on locally for you can you please test this and see if you see the same?

library(climateR); library(RNetCDF); library(terra)

test = ext(c(-98.90219, -65.21105, 28.12502, 50.43962)) |>
  vect(crs = '+proj=longlat +ellps=WGS84 +no_defs') |>
  getTerraClimNormals(
    varname = "tmin",
    scenario = "19812010",
    month = 1,
    dryrun = TRUE)
#> source:   http://thredds.northwestknowledge.net:8080/thredds/dodsC/TER... 
#> varname(s):
#>    > tmin [deg C] (min_temperature)
#> ==================================================
#> diminsions:  809, 566, 1 (names: lon,lat,time)
#> resolution:  0.042, 0.042, 1 month
#> extent:      -98.92, -65.21, 28.13, 51.71 (xmin, xmax, ymin, ymax)
#> crs:         +proj=longlat +a=6378137 +f=0.00335281066474748 +p...
#> time:        1961-01-01 to 1961-01-01
#> ==================================================
#> values: 457,894 (vars*X*Y*T)

nc = open.nc(test$URL)

print.nc(nc)
#> netcdf classic {
#> dimensions:
#>  lat = 566 ;
#>  lon = 809 ;
#>  time = 1 ;
#> variables:
#>  NC_SHORT tmin(lon, lat, time) ;
#>      NC_SHORT tmin:_FillValue = -32768 ;
#>      NC_CHAR tmin:units = "deg C" ;
#>      NC_CHAR tmin:description = "Average Minimum 2-m Temperature" ;
#>      NC_CHAR tmin:long_name = "tmin" ;
#>      NC_CHAR tmin:standard_name = "tmin" ;
#>      NC_SHORT tmin:missing_value = -32768 ;
#>      NC_CHAR tmin:dimensions = "lon lat time" ;
#>      NC_CHAR tmin:grid_mapping = "crs" ;
#>      NC_CHAR tmin:coordinate_system = "WGS84,EPSG:4326" ;
#>      NC_DOUBLE tmin:scale_factor = 0.1 ;
#>      NC_DOUBLE tmin:add_offset = -72 ;
#>      NC_INT tmin:_ChunkSizes = 1, 720, 1440 ;
#> }

Created on 2024-03-17 with reprex v2.0.2

mfertakos commented 6 months ago

Hi Mike. The nc=open.nc(test$URL) gets stuck running like the getTerraClimNormals() function was doing on this machine. I am using R version 4.2.3.

mikejohnson51 commented 6 months ago

Morning - I suspect that means your R-based (maybe system to) GDAL is unable to access NetCDF files. What does this give you?

sf::st_drivers() |>
  dplyr::filter(name == 'netCDF')
#>          name                  long_name write copy is_raster is_vector   vsi
#> netCDF netCDF Network Common Data Format  TRUE TRUE      TRUE      TRUE FALSE

terra::gdal(drivers = TRUE) |>
  dplyr::filter(name == 'netCDF')
#>     name   type        can   vsi                  long.name
#> 1 netCDF raster read/write FALSE Network Common Data Format

Created on 2024-03-18 with reprex v2.0.2

mfertakos commented 6 months ago

Hi again. Here is what I get when I run these two lines of code:

sf::st_drivers() |> dplyr::filter(name == 'netCDF') name long_name write copy is_raster is_vector vsi netCDF netCDF Network Common Data Format TRUE TRUE TRUE TRUE FALSE terra::gdal(drivers = TRUE) |> dplyr::filter(name == 'netCDF') name type can vsi long.name 1 netCDF raster read/write FALSE Network Common Data Format

It appears the same as your output. The good news is I was able to get it all to run on my other machine, but appreciate your help in figuring out this issue!

mikejohnson51 commented 6 months ago

Of course! Since this is a local machine issue I am going to close this. Please feel free to email me if you cant get it working and I'll do my best to help :)