I am running a mitmdump with the hardump option set. These are 2 hour-long web sessions, and at the very end of a hardump run with when exporting the HAR file (when typing CTRL+C) I sporadically get an error. This error (EOFError when decoding b'\x1f\x8b with 'gzip') prevents any HAR data from being written to disk. You can see the full error message at the bottom of this issue.
My suspicion is that the error is caused by a server connection still being open at the time that I press CTRL+C on the teminal with mitmdump, so that's why the gzip stream is not "wrapped up" properly.
Proposal
I recommend adding extra error handling "try - except" block to the "save_har" function in the core savehar.py module. This will prevent the HAR export from totally failing, and will only fail to save the 1 problematic URL in the session. I also recommend that the error should call out the URL that caused the gzip issue, it gives users a chance to change things/troubleshoot intelligently.
Alternatives
I'm on Windows and not confident in modifying mitmproxy, so I rolled my own version of savehar.py as an addon that I include with the "-s" option to mitmdump. It solves the issue. I'm importing the "traceback" module and adding an extra "try-except" block to "make_har" like this:
def make_har(self, flows: Sequence[flow.Flow]) -> dict:
entries = []
skipped = 0
# A list of server seen till now is maintained so we can avoid
# using 'connect' time for entries that use an existing connection.
servers_seen: set[Server] = set()
for f in flows:
if isinstance(f, http.HTTPFlow):
try:
entries.append(self.flow_entry(f, servers_seen))
except Exception as e:
e.add_note(
"Error saving compressed stream. The problem url is: {}".format(
f.request.pretty_url
)
)
print(traceback.format_exc())
else:
skipped += 1
Additional context
--- start of error message ---
[16:32:44.523] Addon error: EOFError when decoding b'\x1f\x8b with 'gzip': EOFError('Compressed file ended before the end-of-stream marker was reached')
Traceback (most recent call last):
File "C:\Users\Jacob\AppData\Roaming\Python\Python310\site-packages\mitmproxy\net\encoding.py", line 62, in decode
decoded = custom_decode[encoding](encoded)
File "C:\Users\Jacob\AppData\Roaming\Python\Python310\site-packages\mitmproxy\net\encoding.py", line 151, in decode_gzip
return gfile.read()
File "C:\Program Files\Python310\lib\gzip.py", line 301, in read
return self._buffer.read(size)
File "C:\Program Files\Python310\lib\_compression.py", line 118, in readall
while data := self.read(sys.maxsize):
File "C:\Program Files\Python310\lib\gzip.py", line 507, in read
raise EOFError("Compressed file ended before the "
EOFError: Compressed file ended before the end-of-stream marker was reached
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\Jacob\AppData\Roaming\Python\Python310\site-packages\mitmproxy\addons\savehar.py", line 128, in done
self.export_har(self.flows, ctx.options.hardump)
File "C:\Users\Jacob\AppData\Roaming\Python\Python310\site-packages\mitmproxy\command.py", line 322, in wrapper
return function(*args, **kwargs)
File "C:\Users\Jacob\AppData\Roaming\Python\Python310\site-packages\mitmproxy\addons\savehar.py", line 39, in export_har
har = json.dumps(self.make_har(flows), indent=4).encode()
File "C:\Users\Jacob\AppData\Roaming\Python\Python310\site-packages\mitmproxy\addons\savehar.py", line 58, in make_har
entries.append(self.flow_entry(f, servers_seen))
File "C:\Users\Jacob\AppData\Roaming\Python\Python310\site-packages\mitmproxy\addons\savehar.py", line 185, in flow_entry
len(flow.response.content) if flow.response.content else 0
File "C:\Users\Jacob\AppData\Roaming\Python\Python310\site-packages\mitmproxy\http.py", line 337, in content
return self.get_content()
File "C:\Users\Jacob\AppData\Roaming\Python\Python310\site-packages\mitmproxy\http.py", line 393, in get_content
content = encoding.decode(self.raw_content, ce)
File "C:\Users\Jacob\AppData\Roaming\Python\Python310\site-packages\mitmproxy\net\encoding.py", line 71, in decode
raise ValueError(
ValueError: EOFError when decoding b'\x1f\x8b with 'gzip': EOFError('Compressed file ended before the end-of-stream marker was reached')
--- end of error message ---
The root cause here is that we unconditionally access .content in the hardump addon, but .content may raise if servers send us malformed stuff. So we should
Add a test with an invalid gzipped response body and verify that it fails.
Fix the savehar addon to handle these cases gracefully. No particular opinion on how other than that we shouldn't just discard these flows.
Problem Description
I am running a mitmdump with the hardump option set. These are 2 hour-long web sessions, and at the very end of a hardump run with when exporting the HAR file (when typing CTRL+C) I sporadically get an error. This error (EOFError when decoding b'\x1f\x8b with 'gzip') prevents any HAR data from being written to disk. You can see the full error message at the bottom of this issue.
My suspicion is that the error is caused by a server connection still being open at the time that I press CTRL+C on the teminal with mitmdump, so that's why the gzip stream is not "wrapped up" properly.
Proposal
I recommend adding extra error handling "try - except" block to the "save_har" function in the core savehar.py module. This will prevent the HAR export from totally failing, and will only fail to save the 1 problematic URL in the session. I also recommend that the error should call out the URL that caused the gzip issue, it gives users a chance to change things/troubleshoot intelligently.
Alternatives
I'm on Windows and not confident in modifying mitmproxy, so I rolled my own version of savehar.py as an addon that I include with the "-s" option to mitmdump. It solves the issue. I'm importing the "traceback" module and adding an extra "try-except" block to "make_har" like this:
Additional context
Here is an example URL where this error occurs: https://voila.ca/products?sortBy=nameAscending&sublocationId=d54cf92c-52f4-4321-8fff-66285455932d