Open jbusecke opened 6 months ago
My wish here would be for a stage that does the following:
OpenWithFsspec
I have added concurrent downloads in #172 (and the accidental push to main before 🙈).
After discussion with PGF folks, I think I should implement a check if urls are available as part of the async-client in pangeo-forge-esgf.
https://console.cloud.google.com/dataflow/jobs/us-central1/2024-03-28_20_29_34-3856017896282669695;logsSeverity=INFO?project=leap-pangeo&pageState=(%22dfTime%22:(%22l%22:%22dfJobMaxTime%22))&authuser=1&supportedpurview=project
We really need a way to cache data in a less expensive way. The job above just wasted 12 DCU just to find out that one of the files wasn't available.
If we can, we should restrict the amount of workers and download within threads on a single worker (see https://github.com/pangeo-forge/pangeo-forge-recipes/issues/713). The scaling seems to only be efficient if we have fast downloads?