mitmproxy / mitmproxy

An interactive TLS-capable intercepting HTTP proxy for penetration testers and software developers.
https://mitmproxy.org
MIT License
35.99k stars 3.99k forks source link

Extra error handling in savehar.py to reduce impact from "EOFError: Compressed file ended before the end-of-stream marker was reached" #7170

Open raccoonix opened 1 week ago

raccoonix commented 1 week ago

Problem Description

I am running a mitmdump with the hardump option set. These are 2 hour-long web sessions, and at the very end of a hardump run with when exporting the HAR file (when typing CTRL+C) I sporadically get an error. This error (EOFError when decoding b'\x1f\x8b with 'gzip') prevents any HAR data from being written to disk. You can see the full error message at the bottom of this issue.

My suspicion is that the error is caused by a server connection still being open at the time that I press CTRL+C on the teminal with mitmdump, so that's why the gzip stream is not "wrapped up" properly.

Proposal

I recommend adding extra error handling "try - except" block to the "save_har" function in the core savehar.py module. This will prevent the HAR export from totally failing, and will only fail to save the 1 problematic URL in the session. I also recommend that the error should call out the URL that caused the gzip issue, it gives users a chance to change things/troubleshoot intelligently.

Alternatives

I'm on Windows and not confident in modifying mitmproxy, so I rolled my own version of savehar.py as an addon that I include with the "-s" option to mitmdump. It solves the issue. I'm importing the "traceback" module and adding an extra "try-except" block to "make_har" like this:

 def make_har(self, flows: Sequence[flow.Flow]) -> dict:
        entries = []
        skipped = 0
        # A list of server seen till now is maintained so we can avoid
        # using 'connect' time for entries that use an existing connection.
        servers_seen: set[Server] = set()

        for f in flows:
            if isinstance(f, http.HTTPFlow):
                try:
                    entries.append(self.flow_entry(f, servers_seen))
                except Exception as e:
                    e.add_note(
                        "Error saving compressed stream. The problem url is: {}".format(
                            f.request.pretty_url
                            )
                    )
                    print(traceback.format_exc())

            else:
                skipped += 1

Additional context

--- start of error message ---
[16:32:44.523] Addon error: EOFError when decoding b'\x1f\x8b with 'gzip': EOFError('Compressed file ended before the end-of-stream marker was reached')
Traceback (most recent call last):
  File "C:\Users\Jacob\AppData\Roaming\Python\Python310\site-packages\mitmproxy\net\encoding.py", line 62, in decode
    decoded = custom_decode[encoding](encoded)
  File "C:\Users\Jacob\AppData\Roaming\Python\Python310\site-packages\mitmproxy\net\encoding.py", line 151, in decode_gzip
    return gfile.read()
  File "C:\Program Files\Python310\lib\gzip.py", line 301, in read
    return self._buffer.read(size)
  File "C:\Program Files\Python310\lib\_compression.py", line 118, in readall
    while data := self.read(sys.maxsize):
  File "C:\Program Files\Python310\lib\gzip.py", line 507, in read
    raise EOFError("Compressed file ended before the "
EOFError: Compressed file ended before the end-of-stream marker was reached

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Jacob\AppData\Roaming\Python\Python310\site-packages\mitmproxy\addons\savehar.py", line 128, in done
    self.export_har(self.flows, ctx.options.hardump)
  File "C:\Users\Jacob\AppData\Roaming\Python\Python310\site-packages\mitmproxy\command.py", line 322, in wrapper
    return function(*args, **kwargs)
  File "C:\Users\Jacob\AppData\Roaming\Python\Python310\site-packages\mitmproxy\addons\savehar.py", line 39, in export_har
    har = json.dumps(self.make_har(flows), indent=4).encode()
  File "C:\Users\Jacob\AppData\Roaming\Python\Python310\site-packages\mitmproxy\addons\savehar.py", line 58, in make_har
    entries.append(self.flow_entry(f, servers_seen))
  File "C:\Users\Jacob\AppData\Roaming\Python\Python310\site-packages\mitmproxy\addons\savehar.py", line 185, in flow_entry
    len(flow.response.content) if flow.response.content else 0
  File "C:\Users\Jacob\AppData\Roaming\Python\Python310\site-packages\mitmproxy\http.py", line 337, in content
    return self.get_content()
  File "C:\Users\Jacob\AppData\Roaming\Python\Python310\site-packages\mitmproxy\http.py", line 393, in get_content
    content = encoding.decode(self.raw_content, ce)
  File "C:\Users\Jacob\AppData\Roaming\Python\Python310\site-packages\mitmproxy\net\encoding.py", line 71, in decode
    raise ValueError(
ValueError: EOFError when decoding b'\x1f\x8b with 'gzip': EOFError('Compressed file ended before the end-of-stream marker was reached')
--- end of error message ---

Here is an example URL where this error occurs: https://voila.ca/products?sortBy=nameAscending&sublocationId=d54cf92c-52f4-4321-8fff-66285455932d

mhils commented 1 week ago

Thank you for the detailed report, @raccoonix! 🍰

The root cause here is that we unconditionally access .content in the hardump addon, but .content may raise if servers send us malformed stuff. So we should

  1. Add a test with an invalid gzipped response body and verify that it fails.
  2. Fix the savehar addon to handle these cases gracefully. No particular opinion on how other than that we shouldn't just discard these flows.
  3. Make sure test passes now. :)