zincware / MDSuite

A post-processing engine for particle simulations
https://mdsuite.readthedocs.io/
Eclipse Public License 2.0
36 stars 7 forks source link

Change compression algorithm #569

Closed SamTov closed 1 year ago

SamTov commented 1 year ago

Found a case where the compression algorithm is too punishing. For scientific notation it is very easy to get values on a scale < 1e-6. Therefore I propose changing back to gzip where no data can be lost.

christophlohrmann commented 1 year ago

do we not use h5 and sql to store data? Where is that compression applied?

PythonFZ commented 1 year ago

Does this break backwards compatibility?

SamTov commented 1 year ago

do we not use h5 and sql to store data? Where is that compression applied?

Compression is applied to the hdf5 datasets

SamTov commented 1 year ago

Does this break backwards compatibility? It shouldn't no.

christophlohrmann commented 1 year ago

do we not use h5 and sql to store data? Where is that compression applied?

Compression is applied to the hdf5 datasets

I thought hdf5 was a binary format. Anyhow I think we should always go for lossless compression in any data storage

SamTov commented 1 year ago

do we not use h5 and sql to store data? Where is that compression applied?

Compression is applied to the hdf5 datasets

I thought hdf5 was a binary format. Anyhow I think we should always go for lossless compression in any data storage

It is but you can still apply compression to it. I agree and have found some cases that really require it.

SamTov commented 1 year ago

Is it possible to test this? Otherwise I guess the tests fail because of something else?

So the tests failing in the IC could actually be because of the compression change but I need to check what is happening. In general it does not break backwards compatibility though because the compression algorithm is checked by h5py before loading the data. Otherwise one would need to pass the algorithm to the data loader.

I will try to resolve these tests failures locally.