Closed ryanvade closed 4 years ago
AFAIK zlib-based streams (including gzip) don't support seeking or resumption in the middle of the stream. I don't think that this is possible to implement in this library
Really? I've been able to do this with the Zlib package for Python. I'm trying to decompress chunks from the start of the object in order, not random seeking.
https://docs.python.org/3/library/zlib.html#zlib.decompressobj
def stream_zlib_decompress(stream):
# offset 32 to skip the header
dec = zlib.decompressobj(32 + zlib.MAX_WBITS)
for chunk in stream:
rv = dec.decompress(chunk)
if rv:
yield rv
stream = s3_object.get(PartNumber=1)
stream = stream.get("Body")
part = b""
for data in stream_zlib_decompress(stream):
part = part + data
if len(part) >= 1024:
break
That's supported through the Decompress
object, but the window bits are not currently exported as a parameter. If you need that then it should be easy enough to add a constructor for it!
It doesn't seem I can set the window bits to +47 according to https://github.com/alexcrichton/flate2-rs/blob/5ef87027cf9a9a6c876886279f74215c7965a902/src/mem.rs#L349
Ah true! AFAIK no one's really tinkered with that historically. If the underlying C implementation supports other values of window_bits
then we probably just need to update the assertion.
According to the Python Zlib docs here are the supported values:
+8 to +15: The base-two logarithm of the window size. The input must include a zlib header and trailer.
0: Automatically determine the window size from the zlib header. Only supported since zlib 1.2.3.5.
−8 to −15: Uses the absolute value of wbits as the window size logarithm. The input must be a raw stream with no header or trailer.
+24 to +31 = 16 + (8 to 15): Uses the low 4 bits of the value as the window size logarithm. The input must include a gzip header and trailer.
+40 to +47 = 32 + (8 to 15): Uses the low 4 bits of the value as the window size logarithm, and automatically accepts either the zlib or gzip format.
As a side note, I have been trying this out with the following:
const b:&[u8] = b"\x1f\x8b\x08\0\0\0\0\0\0\0\xec\xbd\xd9r\xe4X\x926v\r=....";
let mut decompress = Decompress::new_with_window_bits(true, 15); // prefer 47
let mut buf = Vec::new();
let resp = decompress.decompress(&b, &mut buf, FlushDecompress::None);
// Check for errors and such here
Does this test make sense?
Also, on this line https://github.com/alexcrichton/flate2-rs/blob/master/src/ffi/rust.rs#L53 new_boxed is being used instead of new_boxed_with_window_bits. Basically the window_bits are unused, not to mention that in new_boxed_with_window_bits the window_bits type is i32 but in Inflate::make its u8.
Edit: noticed this is for the rust backend not c backend
For the Rust backend that's expected because that's translated from miniz which doesn't support different values of window bits. Only the zlib C backends support different values of window bits, which is why the public constructor is also gated behind that feature
Indeed, in that case according to https://github.com/madler/zlib/blob/cacf7f1d4e3d44d871b605da3b647f07d718623f/zlib.h#L832-L882 the c backend should support window bits between -15..47 . Perhaps that will fix my issues.
Allowing window bits of 47 removes the deflate decompression error I was receiving, but I end up with an empty output buffer.
The only thing decompressing with a smaller window size means in practice is that the decompressor will error out if the data is compressed with a larger window size and has matches that are outside the window the decompressor used. It will affect compression since it limits how far back the compressor will look for matches. zlib is old, and in a very memory-starved environment it made sense to have the option to have a smaller window using less memory if a 32k buffer was too much, but other implementations like miniz didn't bother with implementing that. Adding extra window_bits values won't help you decode partial streams, it's just another way of telling zlib what headers to look for.
What you want to do for partial decompression is to add some parameter to skip crc or zlib validation when ending decompression. I think read_to_string
will try to read until failure, so maybe you can do it with the current library by doing read())
calls manually instead (similar to what your python impl does.)
I'll try that solution, setting the window size to > 15 is really only useful for 15 + 16 to force only gzip while 15 + 32 is nice for automatic header detection.
I was able to solve this with read calls on the Decompress struct.
I'm working on an application that pulls parts of compressed files out of S3. I don't want to pull the entire file out of s3 due to file sizes. I should be able to pass part of the file to GzDecode and decompress just the chunk.
However, I get a corrupt deflate stream error. Is it not possible to pass only part of a gzip compressed file to the GzDecoder?