szcompressor / SZ3

Error-bounded Lossy Data Compressor (for floating-point/integer datasets)
https://szcompressor.org/
Other
72 stars 29 forks source link

[QUESTION] Have you checked backwards compatibility? #50

Open vasole opened 9 months ago

vasole commented 9 months ago

Dear colleagues,

Just to ask if you have tried to read an HDF5 file generated with the previous version using the just released version.

We have troubles reading an old file and we would like to make sure what side we should look at.

https://github.com/silx-kit/hdf5plugin/pull/289

Thanks!

ayzk commented 9 months ago

Dear Solé,

Thanks for reaching out. We are still improving our algorithm continuously, as a result, it's difficult for us to guarantee backward compatibility at this stage. However, we can put the version number to the compressed format and add a compatibility check before decompression: if the version doesn't match, the decompression will stop with an error message with the correct version to use. Would this temporary solution solve the HDF5 integration problem? Thanks.

Best, Kai

vasole commented 9 months ago

Would this temporary solution solve the HDF5 integration problem?

At least it would prevent unnoticed errors.

I would expect that breakage does not happen too often. To have a one-to-one correspondence between generated files and versions would make the usage of SZ3 as HDF5 plugin questionable at this stage of its development.

If the error message reports the set of compatible versions with the written file, a workaround could be implemented at the user side by registering and unregistering different versions of the plugin.

t20100 commented 9 months ago

we can put the version number to the compressed format and add a compatibility check before decompression: if the version doesn't match, the decompression will stop with an error message with the correct version to use. Would this temporary solution solve the HDF5 integration problem?

Yes, that would prevent reading different data without notice.

ayzk commented 9 months ago

@vasole @t20100 Thanks for the reply.

I've added the data version to SZ3.

  1. To allow maximum compatibility, the data version is not a one-to-one relation to the program version. For example, program v3.1.8, v3.1.9, v3.2.1 will use the same data version if no changes related to data format is made in those program versions.
  2. When compression, SZ3 will check the data version, and throw exception if it is not supported. The exception message contains the correct program version the user should use.
  3. The change is not in master yet (https://github.com/szcompressor/SZ3/tree/newapi). It will be release next month.
t20100 commented 9 months ago

Great, thanks!

Does this also detects previous versions where the data version was not stored?

ayzk commented 9 months ago

@t20100 The new program can tell users the data is generated from old versions, but it cannot tell the exact old version number since it was not stored before.

t20100 commented 9 months ago

Thanks! Sounds good to me as long as the user is informed that there's an issue with the version.

ayzk commented 9 months ago

@t20100 Great! Thanks.

t20100 commented 4 months ago

Hi,

I'm planing to make a release of hdf5plugin which embeds SZ3. Do you foresee a release of SZ3 that tackles this issue? If so, please let me know, I can wait for a new version of SZ3 to embed it in hdf5plugin.

Best,

ayzk commented 2 months ago

@t20100 We released SZ v3.2.0 last week. It contains the logic we discussed in this thread (checking the data version before decompression). Moreover, the HDF5 filter has been completed rewritten to support SZ3::config. I think this version is good for the hdf5plugin, the only issue is that it is not extensively tested yet. If you have any issues running this version, please let me know and I will fix them quickly. Thanks.

t20100 commented 2 months ago

Thanks! I will try to embed the latest version in hdf5plugin and let you know how it goes.

t20100 commented 2 months ago

I gave a try to embed v3.2.0 in hdf5plugin (see https://github.com/silx-kit/hdf5plugin/pull/289).

Unfortunately, there is an issue before the version check in the library. Indeed the format of the cd_values has changed and the new version tries to read a Config from it while the previous version of the filter was storing a couple of float64.

Also, the change of cd_values makes it harder to integrate in Python (see https://github.com/silx-kit/hdf5plugin/pull/289#issuecomment-2314587768 for more details).

And BTW, did you try reading a HDF5 SZ3 compressed dataset on a machine with a different endianess than the machine it was written on?