Example data/models caching and download

mantle-convection-constrained / terratools

Tools to read, analyse and visualise models written by the TERRA mantle convection code

https://terratools.readthedocs.io/en/latest/

MIT License

7 stars 5 forks source link

Example data/models caching and download #101

Closed eejwa closed 1 year ago

eejwa commented 1 year ago

We need an example TERRA model and some example thermodynamic lookup tables. It would be good to have a robust mechanism to cache these downloads too.

andreww commented 1 year ago

It looks like we should be using pooch for this: https://pypi.org/project/pooch/ - it's what scipy uses so we should be okay adding the dependancy. I guess we'll need a thin wrapper to set things like the cache directory. It supports zenodo, so we should probably put files there (or in a dedicated separate github repository).

andreww commented 1 year ago

I have a basic implementation of this at #116 - it'll be quite easy to add new example datasets and we have a good way to cache them locally and flag updates (we 'just' need to bump the doi version number when that is needed). Does this look sufficiently simple? I cannot help but think that there would be a better way to create those functions to return the file names.

jamespanton93 commented 1 year ago

Could you remind me if we decided that we would host a whole example model (eg one of the mt=128 resolution simulations?) If so should I send one over to you @andreww to be added to zenodo?

andreww commented 1 year ago

@jamespanton93 - I think we talked about a downsampled example but I can see the value of a single full resolution case. How big is it (in total and per file)? I'm currently serving these via figshare which has a 20 GB limit (and practically we need to upload via a web browser, which may struggle below that). Or you could upload (no reason why we cannot get data from multiple places). Maybe something to discuss on Wednesday.

The other thing we should think about is where we actually want the downloaded material being stored. It's currently in a directory in wherever the OS defaults to for it's cached data. We could cache it alongside the module install (I think) or we could choose somewhere else. It's really a question of what we think the user may want to do with the downloaded example files.

jamespanton93 commented 1 year ago

For the mt=128 simulations that I have been mostly running, each of the .comp files is ~2.9 MB making the total about 370 MB, and each of the .seis files is ~3.3 MB, making the total about 420 MB. We could also include an example for files which are just a single layer - these are only 41 KB each so the total for a set of these would be just over 5 MB.

andreww commented 1 year ago

This may be too large to run (even though it's not too large to store) @eejwa will create a downsampled model to use.

anowacki commented 1 year ago

Closed by #116.

anowacki commented 1 year ago

We have decided to upload a lower-resolution or downsampled model file to preserve people's and CI's bandwidth. @eejwa will do this.