Open brubbel opened 5 years ago
Various compressor types have flush()
methods. Search README.rst
for flush
and you should find relevant documentation.
My understanding is that zstd.FLUSH_BLOCK
(which corresponds to ZSTD_e_flush
) will ensure any data written to the compressor so far will be decodeable on a decompressor. I even remember speaking with the zstd maintainers to confirm this behavior. If you are not seeing this behavior, it is either a bug in python-zstandard
or buffering on the output/input streams outside of python-zstandard
could be at fault.
python-zstandard does not call ZSTD_flushStream()
directly. As zstd.h
says, this function is equivalent to ZSTD_compressStream2(zcs, output, &emptyInput, ZSTD_e_flush)
. And this should be what some flush()
methods are calling.
If you want to audit the source code, I recommend reading cffi.py
, as the Python code is a bit easier to comprehend than the C code. The CFFI and C bindings should be functionality equivalent.
My understanding is that zstd.FLUSH_BLOCK (which corresponds to ZSTD_e_flush) will ensure any data written to the compressor so far will be decodeable on a decompressor.
That is correct. One can start to decompress, but from my tests I have to conclude that not all data is flushed by the compressor. If FLUSH_FRAME is called, it does sync all data but also resets the current dictionary.
This is in contrast to zlib.Z_SYNC_FLUSH, which flushes all data, but allows to continue with the same dictionary.
Would it be possible for you to articulate your request in terms of zstd C API calls and/or python-zstandard functions? I'm not sure I fully understand what it is you are trying to do. We seem to be talking about the streaming APIs. But dictionaries are also involved. There's enough combinations that I'm not sure exactly what the request is for.
Is there currently a binding to ZSTD_flushStream()?
It seems that zstd.FLUSH_BLOCK does allow the decompressor to decode valid data, but not up to the latest data written into the compressor.