mdtanker / polartoolkit

Helpful tools for polar researchers
http://polartoolkit.readthedocs.io/
MIT License
37 stars 5 forks source link

Convert fetching grids to use compressed zarr files #120

Open mdtanker opened 1 year ago

mdtanker commented 1 year ago

Description of the desired feature:

Due to some issues with Bedmap2 geotiffs (#119 ) as well as the pooch cache starting to grow very large, it would be good to convert any fetch calls on gridded data (geotiffs, netcdfs, etc) to preprocess the files into Zarr files. Initial testing with the bedmap2 tiffs showed a good amount of compression by using .zarr's, 87.8 MB for a .tif and 39.2 MB for a .zarr.

Unfortunately, Pooch keeps the unzipped file, as well as the non-preprocessed .tif files. So preprocessing all files to .zarr will only save space if we get pooch to delete the original files.

There is some discussion of this here, here

There doesn't seem to be support for this yet with Pooch. Maybe we can just use os.remove(fname) at some point in the fetch call?

For now I will start convertign to zarr anyway.

Geotiffs

NetCDFs

Are you willing to help implement and maintain this feature?

mdtanker commented 1 year ago

Discovered that you can't use os.remove(fname) since pooch uses the originally download file names (unzipped) in each call, even if preprocessing is already applied.