Closed fBedecarrats closed 1 year ago
25MB is not that large so the decision was made that the whole file is downloaded ~(though only for the time period requested)~. What is the issue? I see that the ETA is quite high indicating that either the source server is not quite responsive or your internet connection is really slow?
The issue is that there are more than 12 096 files to download (ie 1 per month since 1981) and that it takes > 10h with a good connexion. The total size is ~3Go. This is not excessive if we talk about long term global or cross-regional analysis, but it might be prohibitive for localized analysis, in particular when analysis reside in countries with challenging internet connexions.
I only see 489 files, but point taken. I see this related to the discussion about moving the package to a "cloud-native" solution. Current approach is that you have to download all resource locally. If we relied only on cloud-native data formats (e.g. COGs, GeoArrow and the likes) we really could query only the required data (though it might to be prohibitive for many polygons). The idea of downloading the global layer was that you would do it once and share it between projects even if the single projects might be very localized.
25MB is not that large so the decision was made that the whole file is downloaded (though only for the time period requested). What is the issue? I see that the ETA is quite high indicating that either the source server is not quite responsive or your internet connection is really slow?
As mentionned in the documentation, the package automatically downloads CHIRPS data since January 1981:
The data can be used to retrieve information on the amount of rainfall. Due to the availability of +30 years, anomaly detection and long-term average analysis is also possible. The routine will download the complete archive in order to support long-term average and anomaly calculations with respect to the 1981 - 2010 climate normal period. Thus no additional arguments need to be specified.
This is the behaviour we see in the reprex above. The portfolio has been set with years of insterest from 2000 to 2021, but the data is downloaded since 1981.
This is the behaviour we see in the reprex above. The portfolio has been set with years of insterest from 2000 to 2021, but the data is downloaded since 1981.
You are right. That is because we calculate precipitation anomalies down the line so we need a 30 year climate-normal period.
I see this related to the discussion about moving the package to a "cloud-native" solution.
Since I realized that TMF dataset was not a relevant candidate for my work (too few moist forest in Madagascar), I would like to prioritize this aspect. It would be nice to start specifying an overarching understanding of "cloud -native" solutions to be developped, so I'm not tempted to work on something that only works on the platform I work on (MinIO, which is an open source implementation of Amazon S3). I'll move this item to a specific discussion (#143).
Reprex:
Current progress:
As mentionned in the documentation, the package downloads monthly datasets. But these are 24Mo each:
From this source, some ligthter regional datasets are available for Africa and Indonesia, but not in cog format (tifs, pngs and bils).
Has someone some ideas about another spatially filtrable source for CHIRPS data?