Closed UnaiUribarri-TomTom closed 3 months ago
When the user of the library ask for d.skip(X)
how should the library know that X
is after the first frame? Note: the decompressed size is not mandatory fields in the frame header. And how it would know where the first frame ends in the compressed byte stream?
Okay... I need to provide the offset in some metadata and skip the data myself, isn't it?
I don't think O(1) skipping can be implemented with the Zstd framing. There is non-standard seekable format: https://github.com/facebook/zstd/blob/dev/contrib/seekable_format/zstd_seekable_compression_format.md that the upstream library does not implement.
I have a ByteArrayInputStream that has been carefully crafted to contain two Zstd frames with 512MiB of data each. Data is highly compressible, compressing 1GiB into 18MiB approximately.
Some consumers are only interested in the second frame, so they skip completely the first frame.
But ZstdInputStreamNoFinalizer.skip, instead of just skipping the full frame, it is decompressing the frame to a temporal buffer, taking almost a full second instead of almost nothing.
It will be great if ZstdInputStreamNoFinalizer.skip could use some native functionality to optimally skip large chunks of data.