mapme-initiative / mapme.biodiversity

Efficient analysis of spatial biodiversity datasets for global portfolios
https://mapme-initiative.github.io/mapme.biodiversity/dev
GNU General Public License v3.0
33 stars 7 forks source link

Find alternative source for NASA SRTM #174

Closed goergen95 closed 1 year ago

goergen95 commented 1 year ago

The current source for the resource is unreliable. We should search for a more stable alternative.

fBedecarrats commented 1 year ago

In the wiki we have: Data source lists curated by Qiusheng Wu:

There are several DEM (SRTM and others), from different possible sources. I downloaded the updated verison of each catalog and highlighted possible candidates in yellow. Here it is: Catalogs_DEM.xlsx

goergen95 commented 1 year ago

Here is a pretty interesting one! I re-implemented the NASA SRTM resource using the Microsoft Planetary Computer STAC catalog:

https://github.com/mapme-initiative/mapme.biodiversity/blob/3db62b75c4c11087b6afa34496056183d6659763/R/get_nasa_srtm.R#L33-L76

There is a new argument download_srtm which represents a logical. If it is set to TRUE the intersecting tiles will actually be downloaded to the local file system. If it is set to FALSE, it uses GDAL's /vsicurl/ driver to catalog the online resources without downloading the tiles.

Then if for example the elevation indicator is called, using the virtual filesystem driver, it uses the fact that the rasters are provided as COGs and only partial downloads will be conducted.

I expect performance differences to be a function between the size and number of the assets as well as the spatial extent of the total portfolio (e.g. the number of tiles).

Here is a script that we can use to further investigate the trade-offs:

# package setup
load_or_install <- function(lib){
  if(!requireNamespace(lib, quietly = TRUE)) install.packages(lib)
  library(lib, character.only = TRUE)
}
libs <- c("sf", "future", "remotes", "progressr", "rnaturalearth", "rbenchmark")
for(lib in libs) load_or_install(lib)
#> Linking to GEOS 3.11.1, GDAL 3.6.2, PROJ 9.1.1; sf_use_s2() is TRUE

install_github("mapme-initiative/mapme.biodiversity",
               ref = "new-nasa-srtm",
               upgrade = "never")
#> Skipping install of 'mapme.biodiversity' from a github remote, the SHA1 (3db62b75) has not changed since last install.
#>   Use `force = TRUE` to force installation

library(mapme.biodiversity)

# adjust as needed
nworkers <- 4
nsamples <- 100
buffer_size <- 1000

# sample a global portfolio
sf_use_s2(F)
#> Spherical geometry (s2) switched off
world <- ne_countries(returnclass = "sf") |> st_make_valid()
world <- st_crop(world, xmin = 20., ymin = -1.0,
                 xmax = 22., ymax = 1.0) |>
  st_make_valid()
#> although coordinates are longitude/latitude, st_intersection assumes that they
#> are planar
#> Warning: attribute variables are assumed to be spatially constant throughout
#> all geometries
sf_use_s2(T)
#> Spherical geometry (s2) switched on

# construct polygons from sample
set.seed(42)
hooks <- st_sample(world, size = nsamples)
hooks <- st_transform(hooks, "EPSG:3857")
polys <- st_buffer(hooks, dist = buffer_size)
polys <- st_transform(polys, "EPSG:4326")
polys <- st_as_sf(polys)
polys$id <- 1:nrow(polys)

# test elevation indicator
loc1 <- tempfile()
loc2 <- tempfile()
dir.create(file.path(loc1, "tmp"), recursive = TRUE)
dir.create(file.path(loc2, "tmp"), recursive = TRUE)

# using cloud-native
system.time({
  plan(multisession, workers = nworkers)
  with_progress({
    init_portfolio(polys, years = 2000:2020,
                   outdir = loc1, tmpdir = file.path(loc1, "tmp"),
                   add_resources = FALSE) %>%
      get_resources("nasa_srtm", download_srtm = FALSE) %>%
      calc_indicators("elevation", stats_elevation = "mean")
  })
  plan(sequential)
})
#> Starting process to download resource 'nasa_srtm'........
#> Argument 'engine' for resource 'elevation' was not specified. Setting to default value of 'zonal'.
#>    user  system elapsed 
#>   1.878   0.422  24.720

# downloading to local file system
system.time({
  plan(multisession, workers = nworkers)
  with_progress({
    init_portfolio(polys, years = 2000:2020,
                   outdir = loc2, tmpdir = file.path(loc2, "tmp"),
                   add_resources = FALSE) %>%
      get_resources("nasa_srtm", download_srtm = TRUE) %>%
      calc_indicators("elevation", stats_elevation = "mean")
  })
  plan(sequential)
})
#> Starting process to download resource 'nasa_srtm'........
#> Argument 'engine' for resource 'elevation' was not specified. Setting to default value of 'zonal'.
#>    user  system elapsed 
#>   2.017   0.600  30.799

Created on 2023-07-01 with reprex v2.0.2