m2lines / gz21_ocean_momentum

Stochastic-Deep Learning Parameterization of Ocean Momentum Forcing
MIT License
5 stars 1 forks source link

Allow using local/cached CM2.6 dataset in data step #86

Open raehik opened 1 year ago

raehik commented 1 year ago

The intake library appears not to have any caching. This means every time you run the data step, you re-download all parts of the CM2.6 dataset you require. This means more net bandwidth, and more Google Cloud charges.

Maybe we could add a CLI option for intake to try to load data from a given path (cache) first, and revert to the online data if not present (and update the cache...? appears to become more complex when we consider versioning)

raehik commented 1 year ago

We probably shouldn't tackle this until after #85 at least.

arthurBarthe commented 1 year ago

Do you mean to allow for the use of a local download of the cm2.6 simulations? One thing I had done in my implementation is that I implemented some caching: if the file had been downloaded previously, it was using that instead of the Google Bucket. However, to do this I had to modify the fsspec library at the time, which is an ugly solution. Might be worth checking if intake now allows for caching, I don't think it was the case back then.