madler / zlib

A massively spiffy yet delicately unobtrusive compression library.
http://zlib.net/
Other
5.46k stars 2.41k forks source link

zlib inflate returning Z_STREAM_END early #984

Closed FvK91 closed 3 weeks ago

FvK91 commented 3 weeks ago

Hi, I'm having a problem when decompressing a file using zlib (v1.3.1) in which only the first line of the gz-archive is decompressed. After decompressing one 1 line, inflate returns Z_STREAM_END immediately.

Decompressing the archive using a tool like 7-zip works just fine.

The problem also occurs when using the zlib module in Python: only a single line is decompressed. When using the gzip python module everything works fine.

I have added 2 scripts and a dummy.gz file to reproduce the problem. problem_Z_STREAM_END.zip

Since I am able to successfully decompress the file with other tools/libraries I wonder if this is a bug in zlib or if it is expected behavior.

madler commented 3 weeks ago

This is as expected for a gzip file with multiple members, which that one is. Showing the structure of that file with pigz -ltv:

method    check    timestamp    compressed   original reduced  name
gzip 8  d293a4bf  ------ -----         427        960   55.5%  dummy
gzip 8  728ddacc  ------ -----         212       5876   96.4%  <...>
gzip 8  4f4ecb61  ------ -----         299      15712   98.1%  <...>
        2ea6e3a6                       938      22548   95.8%  (total)

There are three members. You simply need to keep decompressing with a new instance of zlib.decompressobj(), or in C, using inflateReset() for each member. From the documentation in zlib.h (always a good idea to read the documentation):

Unlike the gunzip utility and gzread() (see below), inflate() will not automatically decode concatenated gzip members. inflate() will return Z_STREAM_END at the end of the gzip member. The state would need to be reset to continue decoding a subsequent gzip member. This must be done if there is more data after a gzip member, in order for the decompression to be compliant with the gzip standard (RFC 1952).

FvK91 commented 3 weeks ago

Thanks Mark for the clear explanation. Much appreciated! Good to know I can use pigz to analyze gzip files in the future.