mapme-initiative / mapme.biodiversity

Efficient analysis of spatial biodiversity datasets for global portfolios
https://mapme-initiative.github.io/mapme.biodiversity/dev
GNU General Public License v3.0
32 stars 7 forks source link

Issue with biome data #307

Closed duboisl-afd closed 1 month ago

duboisl-afd commented 1 month ago

Hi, I'm working for the AFD, following up on Antoine Vuillot's work. I'm having trouble downloading biome data. I also tried this code:

 library(sf)
 library(mapme.biodiversity)

outdir <- file.path(tempdir(), "mapme-data")
dir.create(outdir, showWarnings = FALSE)

mapme_options(
  outdir = outdir,
  verbose = FALSE
)

aoi <- system.file("extdata", "sierra_de_neiba_478140_2.gpkg",
  package = "mapme.biodiversity"
) %>%
  read_sf() %>%
  get_resources(get_teow()) %>%
  calc_indicators(calc_biome()) %>%
  portfolio_long()

aoi

But the biome column only contains "NULL". And I get the following warning message :

Warning messages:
1: In .f(.x[[i]], ...) :
  Error : Cannot open "/tmp/Rtmpi6Bk1S/mapme-data/teow/wwf_terr_ecos.gpkg"; The file doesn't seem to exist.

2: In .check_single_asset(result, chunk) :
  Error in if (nrow(teow[[1]]) == 0) { : argument is of length zero

Did you already face this kind of issue ? Thanks!

karpfen commented 1 month ago

Hi @duboisl-afd, your code looks fine and also works as expected for me.

From the warning message, there is something wrong with the TEOW data. Either the download failed, the file is damaged or it can't be read for some other reason. Can you please check the following: file.exists("/tmp/Rtmpi6Bk1S/mapme-data/teow/wwf_terr_ecos.gpkg") -> This should return TRUE sf::st_read("/tmp/Rtmpi6Bk1S/mapme-data/teow/wwf_terr_ecos.gpkg") -> This should return the raw TEOW data

Depending on what you see here, we can maybe work out what's wrong.

karpfen commented 1 month ago

Also, you could run the following:

url <- paste(
  "/vsizip//vsicurl/",
  "https://files.worldwildlife.org/wwfcmsprod/files/",
  "Publication/file/6kcchn7e3u_official_teow.zip/",
  "official/wwf_terr_ecos.shp",
  sep = "")
sf::st_read(url)

to see if you're actually able to access the TEOW raw data.

duboisl-afd commented 1 month ago

Hello,

Thanks for your help ! I tried running the following line of code:

file.exists("/tmp/Rtmpi6Bk1S/mapme-data/teow/wwf_terr_ecos.gpkg")

but it returned FALSE. The teow directory does exist, but it is empty.

I also attempted this:

url <- paste(
  "/vsizip//vsicurl/",
  "https://files.worldwildlife.org/wwfcmsprod/files/",
  "Publication/file/6kcchn7e3u_official_teow.zip/",
  "official/wwf_terr_ecos.shp",
  sep = "")
sf::st_read(url)

However, I received the following error message:

Error: Cannot open "/vsizip//vsicurl/https://files.worldwildlife.org/wwfcmsprod/files/Publication/file/6kcchn7e3u_official_teow.zip/official/wwf_terr_ecos.shp"; The file doesn't seem to exist.

Do you have other ideas ?

karpfen commented 1 month ago

Hm, that's odd. Then I can think of two possiblities

  1. There's something wrong or very outdated about your spatial libraries. When you load sf, it tells you what software it uses in the background. On my machine it comes out as
> library(sf)
Linking to GEOS 3.11.2, GDAL 3.8.2, PROJ 9.3.1; sf_use_s2() is TRUE

My sf version is sf_1.0-16. You can check that by calling sessionInfo() after loading the library.

  1. You're being blocked from the website: You can check that by using a different internet connection (e.g. from a mobile hotspot).
duboisl-afd commented 1 month ago

Ok. I'm using sf_1.0-16 as well but my GEOS, GDAL, PROJ seem to be outdated compare to you, do you think that's the reason of my issue ?

library(sf)
Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE
duboisl-afd commented 1 month ago

I've managed to download the gfw_treecover, gfw_lossyear, and , and I'm only having trouble with Nelson et al. data, soil grids and biome data. Do you have any idea why that might be?

fBedecarrats commented 1 month ago

Hi @duboisl-afd,

I understand that mapme.biodiversity now uses cloud-optimized storages to fetch resources. Some resources (like gfw_*) are already available on cloud-optimized platforms, while others (like teow) are hosted by the mapme team since they are not provided on such platforms by their maintainers.

It appears you are using SSP Cloud (a Kubernetes instance with data science pods offered by the French statistical institute: https://datalab.sspcloud.fr/). The RStudio pod comes with an older version of GDAL, which might be causing issues when fetching teow if recent GDAL features are required.

I suggest opening an issue on the GitHub repo that defines the SSP Cloud images (https://github.com/InseeFrLab/images-datascience) to request an update to the GDAL version in the RStudio images. This issue might need to be addressed upstream (as they use the "rocker" images to build their containers), but they might have a solution to update it more easily.

Sorry I can't do more to help (I'm currently on holiday).

fBedecarrats commented 1 month ago

@duboisl-afd , don't forget to give an update here and close the issue.

duboisl-afd commented 1 month ago

Having GDAL 3.8.4 installed, I've managed to download all the data except for Nelson et al. I'll create a new issue if I can't resolve it. Thanks for your help!

goergen95 commented 1 month ago

Hi all,

great you were able to find a solution! For prosperity and others to come here and learn I'll describe the root cause and the respective solutions a bit more in detail.

Concerning the biome data and other vector resources, mapme.biodiversity relies on the ogrinfo utility to read important metadata, such as the spatial extent. This utility was exposed to the API with GDAL 3.7, and as such this is our minimal version requirement.

Depending on your OS, the version of GDAL installed via a package manager might not fulfill this requirement (see here for Ubuntu). For machines with Ubuntu, one solution might be to use the ubuntugis-unstable repository for "more bleeding edge" (self-description of the repo) releases of core spatial software. Note, however, that as of July 2024, the latest GDAL version your are able to obtain from there is 3.8.4 (from February) while current release is 3.9.1. So in case you need a very bleeding edge release of GDAL, you would need other alternatives.

We provide images here with latest installation of GDAL, GEOS, and PROJ and R Spatial libraries build against those libs. Those images are based of off rocker and are ready to be used with mapme tooling (e.g. they also come with the Parquet driver installed, which we use in mapme.pipelines to efficiently process global WDPA portfolios).

Current CRAN releases of mapme.biodiversity is found here:

$ docker pull ghcr.io/mapme-initiative/mapme-spatial:latest

while daily-builds of the GitHub main branch are found here:

$ docker pull ghcr.io/mapme-initiative/mapme-spatial-dev:latest