Some data structures in tobac, for example segmentation masks used to indicate the spatial extent of features, are very sparse and compress fabulously well - about three orders of magnitude, from 2.6 GB to 2.9 MB in some examples @wx4stg pointed out. Users can therefore easily create large files if they choose to save those data.
In my experience, many/most users don't know to turn on compression when they save NetCDF data, and tobac could help those users out by adding the compression encoding attributes to any multidimensional 2D xarray variables it creates.
There are certainly some design challenges here:
detecting where and when to compress
preserving attributes as data flows through the library
the basic fact that writing xarray data structures is really handled by that library, not tobac, and xarray supports multiple output formats with different encoding parameters for compression
@kelcyno had a function in #136 that did this, but that was deferred for later discussion as part of 2.0. I wanted to raise the idea again now that the xarray work is well underway.
Some data structures in tobac, for example segmentation masks used to indicate the spatial extent of features, are very sparse and compress fabulously well - about three orders of magnitude, from 2.6 GB to 2.9 MB in some examples @wx4stg pointed out. Users can therefore easily create large files if they choose to save those data.
In my experience, many/most users don't know to turn on compression when they save NetCDF data, and tobac could help those users out by adding the compression encoding attributes to any multidimensional 2D xarray variables it creates.
There are certainly some design challenges here:
@kelcyno had a function in #136 that did this, but that was deferred for later discussion as part of 2.0. I wanted to raise the idea again now that the xarray work is well underway.