Open ehgus opened 5 days ago
Looks like this was introduced in PR: https://github.com/zarr-developers/numcodecs/pull/519
@normanrz do you have thoughts on this issue?
If I understand correctly, 0.14 adds the checksum
metadata which means that 0.12 cannot read it anymore.
I think it could be acceptable that data compressed with a newer version would not be compatible with an older version. Since, we are still in 0.x releases, minor versions can be breaking.
However, we could also write out the checksum
metadata only if it deviates from the previous default. Are there other cases like this in numcodecs?
So I think there's a couple of issues to disentangle here:
numcodecs
It looks like your error message is about trying to using the numcodecs 0.12 API the same way as the numcodecs 0.14 API. Since we're < version 1.0, we still allow ourselves to make breaking changes in this API every minor version. I think this is what your orignal issue is about?
Decompression backwards compatibility Again, since we're < version 1.0, we don't guarentee that data compressed with a new version of numcodecs will be able to be decompressed with an older version.
Decompression fowards compatibility Again (again 😄), since we're < version 1.0, we don't have to guarentee that data compressed with an old version of numcodecs will be able to be decompressed with a newer version. In practice I think we really should and have to be forwards compatible, because so much data in the wild depends on numcodecs.
So I think that unfortunately your error is not something we can fix or improve (because we cannot go back to modify older releases), and unfortunately a consequence of numcodecs not having a stable release yet.
Note that the points above are my interpretation of our compatibility guarentees, so might not be what others agree on. We should agree and write these down in the docs. And numcodecs really needs a stable release, after which we're careful about breaking changes, since it's so widely used now.
Thanks Norman and David for your feedback! 🙏
Am guessing the focus of the question is on reading data that was previously written consistently.
To that end, think Norman's question above is the one we should focus on here
However, we could also write out the checksum metadata only if it deviates from the previous default. Are there other cases like this in numcodecs?
It wouldn't surprise me if this issue had happened before. Though am not remembering one like this one atm
There have been issues between different libraries/languages before (like zlib/gzip compression: https://github.com/zarr-developers/numcodecs/pull/87 & https://github.com/zarr-developers/numcodecs/issues/169 ). This in turn motivated the following body of work to test reading/writing data between different implementations ( https://github.com/zarr-developers/zarr_implementations ), which in turn led to spec work
That said, think Norman is asking the right question. If we simply leave out the config flag when it is unneeded, it should remain readable with old versions. What are the tradeoffs of doing or not doing this?
I noticed that data compressed with Zstd with numcodecs=0.14.0 cannot decompressed with numcodecs=0.12.1 even though they look compatible except for the
checksum
. I successfully decompress the data by manually erasing thechecksum
argument.The current error message does not tell that the error is due to the use of different version of numcodecs:
I want them to guarantee compatibility for using lower versions or show a more informative error message. My idea is to ignore such additional arguments with a warning and give a clue to the error message when decompression fails. Thank you in advance for your feedback.