Closed eejwa closed 1 year ago
It looks like we should be using pooch for this: https://pypi.org/project/pooch/ - it's what scipy uses so we should be okay adding the dependancy. I guess we'll need a thin wrapper to set things like the cache directory. It supports zenodo, so we should probably put files there (or in a dedicated separate github repository).
I have a basic implementation of this at #116 - it'll be quite easy to add new example datasets and we have a good way to cache them locally and flag updates (we 'just' need to bump the doi version number when that is needed). Does this look sufficiently simple? I cannot help but think that there would be a better way to create those functions to return the file names.
Could you remind me if we decided that we would host a whole example model (eg one of the mt=128 resolution simulations?) If so should I send one over to you @andreww to be added to zenodo?
@jamespanton93 - I think we talked about a downsampled example but I can see the value of a single full resolution case. How big is it (in total and per file)? I'm currently serving these via figshare which has a 20 GB limit (and practically we need to upload via a web browser, which may struggle below that). Or you could upload (no reason why we cannot get data from multiple places). Maybe something to discuss on Wednesday.
The other thing we should think about is where we actually want the downloaded material being stored. It's currently in a directory in wherever the OS defaults to for it's cached data. We could cache it alongside the module install (I think) or we could choose somewhere else. It's really a question of what we think the user may want to do with the downloaded example files.
For the mt=128 simulations that I have been mostly running, each of the .comp files is ~2.9 MB making the total about 370 MB, and each of the .seis files is ~3.3 MB, making the total about 420 MB. We could also include an example for files which are just a single layer - these are only 41 KB each so the total for a set of these would be just over 5 MB.
This may be too large to run (even though it's not too large to store) @eejwa will create a downsampled model to use.
Closed by #116.
We have decided to upload a lower-resolution or downsampled model file to preserve people's and CI's bandwidth. @eejwa will do this.
We need an example TERRA model and some example thermodynamic lookup tables. It would be good to have a robust mechanism to cache these downloads too.