Add ZstdBufferDecompressingStreamNoFinalizer

divijvaidya commented 1 year ago

Motivation At Apache Kafka, we always have the entire compression buffer before we send it to Zstd to decompress. Hence, we don't need a stream to provide the buffer containing compressed data to Zstd. There is no interface in Zstd-JNI which allows to submit a buffer (containing compressed data) and return an InputStream containing uncompressed data.

Changes

Add a new interface ZstdBufferDecompressingStreamNoFinalizer which is similar to existing ZstdDirectBufferDecompressingStreamNoFinalizer, except, it works with heap buffers.
Refactor to extract out a common base class from the two classes above.

Testing Added tests for the new interface.

luben commented 1 year ago

LGTM. Running the test suites now.

Tangential question: what's wrong with:

  inputStream = new ByteArrayInputStream(buff.array());
  decompressedStream = new ZstdInputStream(inputStream);

I think this will be giving the same result.

divijvaidya commented 1 year ago

Tangential question: what's wrong with:

  inputStream = new ByteArrayInputStream(buff.array());
  decompressedStream = new ZstdInputStream(inputStream);

Good question. If we pass compressed data as an InputStream (this is what we do in Apache Kafka today), we can't reliably answer hasRemaining() without making a JNI call and trying to read more data. This is because unlike ByteBuffer's isRemaining() method, InputStream's isAvailable() interface does not guarantee that stream is actually finished.

Hence, to detect that there is no more uncompressed data available, we would have to pay the penalty of JNI call (trying to read another byte and determining it's empty if it returns EOF).

luben commented 1 year ago

Thanks for the contribution!

luben / zstd-jni

Add ZstdBufferDecompressingStreamNoFinalizer #244