luben / zstd-jni

JNI binding for Zstd
Other
808 stars 165 forks source link

ZstdInputStreamNoFinalizer.skip performs poorly when skipping full frames #294

Closed UnaiUribarri-TomTom closed 3 months ago

UnaiUribarri-TomTom commented 6 months ago

I have a ByteArrayInputStream that has been carefully crafted to contain two Zstd frames with 512MiB of data each. Data is highly compressible, compressing 1GiB into 18MiB approximately.

Some consumers are only interested in the second frame, so they skip completely the first frame.

But ZstdInputStreamNoFinalizer.skip, instead of just skipping the full frame, it is decompressing the frame to a temporal buffer, taking almost a full second instead of almost nothing.

It will be great if ZstdInputStreamNoFinalizer.skip could use some native functionality to optimally skip large chunks of data.

luben commented 6 months ago

When the user of the library ask for d.skip(X) how should the library know that X is after the first frame? Note: the decompressed size is not mandatory fields in the frame header. And how it would know where the first frame ends in the compressed byte stream?

unaiur commented 5 months ago

Okay... I need to provide the offset in some metadata and skip the data myself, isn't it?

luben commented 5 months ago

I don't think O(1) skipping can be implemented with the Zstd framing. There is non-standard seekable format: https://github.com/facebook/zstd/blob/dev/contrib/seekable_format/zstd_seekable_compression_format.md that the upstream library does not implement.