m-labs / artiq

A leading-edge control system for quantum information experiments
https://m-labs.hk/artiq
GNU Lesser General Public License v3.0
434 stars 200 forks source link

hdf5 results: use attributes for small datasets #1345

Open jordens opened 5 years ago

jordens commented 5 years ago

ARTIQ Feature Request

Problem this request addresses

A HDF5 dataset uses at least a few kB of disk space while most ARTIQ datasets in practice are smaller than the minimum HDF5 dataset overhead and many (e.g. scalars) are even smaller than their key. This is an inefficient use of disk space.

Describe the solution you'd like

Additional context

airwoodix commented 4 years ago

Could we also add compression to the discussion? As was pointed out on #m-labs, there's support for compression filters for hdf5 datasets. I did some tests to get an impression on readily available and working filters, as well as crude performance. It seems realistic to me to enable Zstandard compression by default on all array datasets (e.g. images), assuming that smaller data is stored as attributes. But this requires further testing, especially on Windows. More / better testing very welcome. At least exposing h5py create_dataset options would be great.