Open jakirkham opened 5 years ago
cc @axtimwalde @funkey @constantinpape
Is there anything we still need to do on this one?
Hi @jakirkham I forgot about this one. Would using LZ4FrameOutputStream
in N5 work for zarr? We could introduce this as a parameter like in GzipCompression
to switch between Gzip and Zlib and then there is at least some intersection?
No worries. Me too. Thanks for looking into this. 🙂
I think so. We would have to test it on some data to be sure.
Sure that could be reasonable. I think we won't be able to reproduce the current Java blocked algorithm in Python, but as long as we have something in common we should be ok. Probably will need some documentation once it is all sorted out.
Hi folks, took a brief look into this, here's the options (I think)...
The current LZ4 codec in numcodecs does the simplest possible thing, which is to add a 4 byte header to store the length of the uncompressed data, then it compresses all the data in a single call to LZ4_compress_fast. So the output is 4 byte header + single block of compressed data.
The Java LZ4FrameOutputStream uses the LZ4 frame format, which has a different header + multiple blocks of compressed data + final checksum.
So option 1 would be that n5-java switches to use LZ4FrameOutputStream and we change numcodecs to also use the LZ4 frame format. (In numcodecs that would actually need to be implemented as a new codec, because it is a different format from the current "lz4" codec.)
Option 2 would be that n5-java switches to use the same encoding as the current numcodecs lz4 codec, i.e., 4 byte header plus single block of compressed data.
Both approaches are fine by me, just trying to lay out the options.
Is there still an outstanding issue here? We were discussing this at the OME-Zarr NGFF meeting.
I am pretty sure that is still a problem; lz4 is not supported in the zarr N5Store
yet, see https://github.com/zarr-developers/zarr-python/blob/master/zarr/n5.py#L403-L469.
It appears that LZ4 support in N5 differs from Zarr. Have not had a chance to dive deeply into it, but here is the gist.
N5 is using the lz4-java library here to compress chunks. This lz4-java library provides its own custom blocked format.
Zarr's Numcodecs library uses
LZ4_compress_fast
, which comes from the lz4 C library.Encountered this issue with
N5Store
in PR ( https://github.com/zarr-developers/zarr/pull/309 ). So disabled LZ4 support inN5Store
for now. Not entirely sure how to bridge the gap between these two, but figured I'd raise this here for awareness and discussion.