projectdiscovery / httpx

httpx is a fast and multi-purpose HTTP toolkit that allows running multiple probes using the retryablehttp library.
https://docs.projectdiscovery.io/tools/httpx
MIT License
7.75k stars 843 forks source link

GZipDecoder cannot process multi-member gzip data #1874

Closed lizeyan closed 3 months ago

lizeyan commented 3 months ago

httpx version:

0.27.0

Current Behavior:

GZipDecoder only decodes the first member in gzip data.

Expected Behavior:

Be capable of decompressing multi-member gzip data (multiple gzip blocks concatenated together) just like gzip.decompress

Steps To Reproduce:

In [1]: import httpx

In [2]: raw_bytes = b'\x1f\x8b\x08\x00\x00\tn\x88\x00\xff\x00\x15\x00\xea\xff{"status": "success",\x03\x00\xeb\xdb\xa3\xb0\x15\x00\x00\x00\x1f\x8
   ...: b\x08\x00\x00\tn\x88\x00\xff\x00\x08\x00\xf7\xff"data": \x03\x00\x1d\xb4\xe6\xc8\x08\x00\x00\x00\x1f\x8b\x08\x00\x00\tn\x88\x00\xff\x00#\
   ...: x00\xdc\xff{"resultType":"matrix","result":[]}\x03\x00\x12\xb7\x95\x1b#\x00\x00\x00\x1f\x8b\x08\x00\x00\tn\x88\x00\xff\x00\x01\x00\xfe\xf
   ...: f}\x03\x00\x0c\xe2\xb6\xfc\x01\x00\x00\x00'

In [3]: from httpx._decoders import GZipDecoder

In [4]: GZipDecoder().decode(raw_bytes)
Out[4]: b'{"status": "success",'

In [5]: import gzip

In [6]: gzip.decompress(raw_bytes)
Out[6]: b'{"status": "success","data": {"resultType":"matrix","result":[]}}'

In [7]:

(The raw_bytes are from a Prometheus query request)

Anything else:

A possible implementation:

class GZipDecoder(ContentDecoder):
    ...

    def decode(self, data: bytes) -> bytes:
        decompressed_data = b""
        try:
            length = len(data)
            offset = 0
            while offset < length:
                chunk = self.decompressor.decompress(data[offset:])
                decompressed_data += chunk
                # Update the offset to the next member
                offset += len(data[offset:]) - len(self.decompressor.unused_data)
                if not self.decompressor.unused_data:
                    break
                else:
                    self.decompressor = zlib.decompressobj(zlib.MAX_WBITS | 16)
            return decompressed_data
        except zlib.error as exc:
            raise DecodingError(str(exc)) from exc

I have test it and it can handle this case.