lz4 / lz4-java

LZ4 compression for Java
Apache License 2.0
1.09k stars 248 forks source link

Support for dependent blocks in decompression #190

Open cnuernber opened 2 years ago

cnuernber commented 2 years ago

Reading an apache arrow file we got:

Dependent block stream is unsupported (BLOCK_INDEPENDENCE must be set).

Is there any interest in supporting this feature? Our system decompresses columns in parallel so block level parallelism in decompression isn't necessary so my thought is to simply concatenate all blocks and decompress them in one shot.

cnuernber commented 2 years ago

The work around for this is to use zstd - unfortunately lz4 is the default format for many of these pathways.

cnuernber commented 2 years ago

The go code manually resizes the dictionary - https://github.com/pierrec/lz4/blob/v4/reader.go#L180.

The java code completely hides the dictionary leading to it being - I think - impossible to do with via simple updates to frameinputstream.

@jpountz - Is it a viable pathway to do a simple update to the java bindings in order to support dependent frames? Another pathway would be to just call the C library directly via FFI bindings.

cnuernber commented 2 years ago

I was able to (hopefully temporarily) work around this using ffi bindings to the c library. Unfortunately this means users need to ensure liblz4 is available on their system.