luben / zstd-jni

JNI binding for Zstd
Other
854 stars 168 forks source link

Add ZstdBufferDecompressingStreamNoFinalizer #244

Closed divijvaidya closed 1 year ago

divijvaidya commented 1 year ago

Motivation At Apache Kafka, we always have the entire compression buffer before we send it to Zstd to decompress. Hence, we don't need a stream to provide the buffer containing compressed data to Zstd. There is no interface in Zstd-JNI which allows to submit a buffer (containing compressed data) and return an InputStream containing uncompressed data.

Changes

  1. Add a new interface ZstdBufferDecompressingStreamNoFinalizer which is similar to existing ZstdDirectBufferDecompressingStreamNoFinalizer, except, it works with heap buffers.
  2. Refactor to extract out a common base class from the two classes above.

Testing Added tests for the new interface.

luben commented 1 year ago

LGTM. Running the test suites now.

Tangential question: what's wrong with:

  inputStream = new ByteArrayInputStream(buff.array());
  decompressedStream = new ZstdInputStream(inputStream);

I think this will be giving the same result.

divijvaidya commented 1 year ago

Tangential question: what's wrong with:

  inputStream = new ByteArrayInputStream(buff.array());
  decompressedStream = new ZstdInputStream(inputStream);

Good question. If we pass compressed data as an InputStream (this is what we do in Apache Kafka today), we can't reliably answer hasRemaining() without making a JNI call and trying to read more data. This is because unlike ByteBuffer's isRemaining() method, InputStream's isAvailable() interface does not guarantee that stream is actually finished.

Hence, to detect that there is no more uncompressed data available, we would have to pay the penalty of JNI call (trying to read another byte and determining it's empty if it returns EOF).

luben commented 1 year ago

Thanks for the contribution!