luben / zstd-jni

JNI binding for Zstd
Other
808 stars 165 forks source link

Tradeoffs with "closeOnFlush"... #296

Open javafanboy opened 6 months ago

javafanboy commented 6 months ago

Just wanted to check if I have understood the tradeoffs with setting closeOnFlush to true or false for a compression stream.

When setting it to true I need to use a bigger buffer in the stream sending data to the compression stream - if not the compression ration will be worse as each flush creates a separate frame. I can compress an arbitrary large dataset as long as there is room in the final byte array or file stream that the compression stream delivers data to. I can re-use my streams by performing flush and on the top most stream and "reset" on the underlying byte array stream.

By setting it to false I can use a smaller buffer as each flush will not trigger a separate compression frame but rather keep adding to the same one until I call close that closes it. If not using a large buffer I will get better compression ratio but on the flip-side I cant reuse my streams (as I must close the compression stream to close the frame) and instead needs to re-create them for each dataset I want to compress.

Is this right? Any more considerations to consider?

luben commented 6 months ago

Yes, the with closeOnFlush turned on it will close and open new frame on each flush. It will result in worse compression ratio as the the compression algorithm looks back only inside the same frame.

I am not sure about your reasoning about buffers - the zstd maintains internally look-back buffers - the size depends on the compression level. So if you use closeOnFlush it's better to call flush more rarely to achieve better compression ratio.