mapme-initiative / mapme.biodiversity

Efficient analysis of spatial biodiversity datasets for global portfolios
https://mapme-initiative.github.io/mapme.biodiversity/
GNU General Public License v3.0
25 stars 7 forks source link

Change source for CHIRPS #185

Closed fBedecarrats closed 10 months ago

fBedecarrats commented 11 months ago

As mentionned in #184, we have 2 problem with the source used for CHIRPS:

  1. Lack of responsiveness of the source (from my tests it is still sort of available, but very, very slow).
  2. In this source, CHIRPS updates have been discontinued in Nov. 2021 for COG format, whereas it is still updated for tif formats.

I guess that we could think about fetching this data from anonther sources. Global CHIRPS daily data is available on Earth-Engine-Catalog. Unfortunately, it is not available on Microsoft Planetary Computer (it is only available for Africa on AWS)... As a side note, there is a new method to access GEE data through the python API (emulated in rgee), although I don't think that we would like to add such a dependency.

goergen95 commented 11 months ago

You mentioned that the GTiffs are still available and updated? Would you mind exploring if a simple change of the URL pointing towards the GTiff folder does the job?

karpfen commented 11 months ago

What's odd is that it's really just loading the cogs directory URL that's so slow. The individual files and other folders are responsive. You can check by running this shell script:

curl -w "Total time: %{time_total}s\n" -o /dev/null -s "https://data.chc.ucsb.edu/products/CHIRPS-2.0/global_monthly/"
curl -w "Total time: %{time_total}s\n" -o /dev/null -s "https://data.chc.ucsb.edu/products/CHIRPS-2.0/global_monthly/tifs/chirps-v2.0.1981.01.tif.gz"
curl -w "Total time: %{time_total}s\n" -o /dev/null -s "https://data.chc.ucsb.edu/products/CHIRPS-2.0/global_monthly/tifs/"
curl -w "Total time: %{time_total}s\n" -o /dev/null -s "https://data.chc.ucsb.edu/products/CHIRPS-2.0/global_monthly/cogs/chirps-v2.0.1981.01.cog"
curl -w "Total time: %{time_total}s\n" -o /dev/null -s "https://data.chc.ucsb.edu/products/CHIRPS-2.0/global_monthly/cogs/"

I'm trying to contact the maintainer of the website and see if they can help and fix this for now.

karpfen commented 11 months ago

Update: The maintainer added the missing COG files now. They will also keep the folder up-to-date in the future.

The URL response time is till very slow, though.

goergen95 commented 11 months ago

Thanks for taking care. Just some minutes ago I actually pushed a quick-fix to main now relying on the GTiffs instead of the COGs. We need to prepare a CRAN release ASAP since some checks are currently failing. Would you agree to stick with the GTiffs for now, and come up with a more sustainable solution later?

See here: 2f79321fc85b95d185e72d9524ff6be894cf3290

karpfen commented 10 months ago

Maybe let's give it another day or two, @goergen95. The data maintainer just confirmed the slow access for the directory on his side and is in touch with his IT folks.