tobac-project / tobac

Tracking and object-based analysis of clouds
BSD 3-Clause "New" or "Revised" License
102 stars 54 forks source link

Add compression encoding attributes to xarray variables by default? #441

Closed deeplycloudy closed 1 month ago

deeplycloudy commented 3 months ago

Some data structures in tobac, for example segmentation masks used to indicate the spatial extent of features, are very sparse and compress fabulously well - about three orders of magnitude, from 2.6 GB to 2.9 MB in some examples @wx4stg pointed out. Users can therefore easily create large files if they choose to save those data.

In my experience, many/most users don't know to turn on compression when they save NetCDF data, and tobac could help those users out by adding the compression encoding attributes to any multidimensional 2D xarray variables it creates.

There are certainly some design challenges here:

  1. detecting where and when to compress
  2. preserving attributes as data flows through the library
  3. the basic fact that writing xarray data structures is really handled by that library, not tobac, and xarray supports multiple output formats with different encoding parameters for compression

@kelcyno had a function in #136 that did this, but that was deferred for later discussion as part of 2.0. I wanted to raise the idea again now that the xarray work is well underway.