Some times, due to an HTTP Server problem, http.client raises an
IncompleteRead exception and fails while reading data from a target URL.
Example: http://www.alumni.weber.edu/
curl and browsers work correctly with the same URL.
$ curl -X GET http://www.alumni.weber.edu/
<html><head><title>Object moved</title></head><body>
<h2>Object moved to <a
href="https://www.alumni.weber.edu/">here</a>.</h2>
</body></html>
curl: (18) transfer closed with outstanding read data remaining
Notice there is a warning but the page is downloaded correctly.
2019-04-13 16:01:05,271 19278 ERROR
MitmProxyHandler(tid=5761,started=2019-04-13T16:01:05.024898,client=127.0.0.1:46234)
warcprox.warcprox.WarcProxyHandler.do_COMMAND(mitmproxy.py:407) error
from remote server(?) 'GET http://www.alumni.weber.edu/ HTTP/1.1':
IncompleteRead(146 bytes read)
Traceback (most recent call last):
File
"/home/vbanos/.pyenv/versions/3.5.2/lib/python3.5/http/client.py", line
541, in _get_chunk_left
chunk_left = self._read_next_chunk_size()
File
"/home/vbanos/.pyenv/versions/3.5.2/lib/python3.5/http/client.py", line
508, in _read_next_chunk_size
return int(line, 16)
ValueError: invalid literal for int() with base 16: b''
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File
"/home/vbanos/.pyenv/versions/3.5.2/lib/python3.5/http/client.py", line
573, in _readinto_chunked
chunk_left = self._get_chunk_left()
File
"/home/vbanos/.pyenv/versions/3.5.2/lib/python3.5/http/client.py", line
543, in _get_chunk_left
raise IncompleteRead(b'')
http.client.IncompleteRead: IncompleteRead(0 bytes read)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File
"/opt/spn/lib/python3.5/site-packages/warcprox-2.4.3-py3.5.egg/warcprox/mitmproxy.py",
line 397, in do_COMMAND
return self._proxy_request()
File
"/opt/spn/lib/python3.5/site-packages/warcprox-2.4.3-py3.5.egg/warcprox/warcproxy.py",
line 211, in _proxy_request
self, extra_response_headers=extra_response_headers)
File
"/opt/spn/lib/python3.5/site-packages/warcprox-2.4.3-py3.5.egg/warcprox/mitmproxy.py",
line 437, in _proxy_request
return self._inner_proxy_request(extra_response_headers)
File
"/opt/spn/lib/python3.5/site-packages/warcprox-2.4.3-py3.5.egg/warcprox/mitmproxy.py",
line 496, in _inner_proxy_request
buf = prox_rec_res.read(65536)
File
"/opt/spn/lib/python3.5/site-packages/warcprox-2.4.3-py3.5.egg/warcprox/mitmproxy.py",
line 198, in read
buf = http_client.HTTPResponse.read(self, amt)
File
"/home/vbanos/.pyenv/versions/3.5.2/lib/python3.5/http/client.py", line
448, in read
n = self.readinto(b)
File
"/home/vbanos/.pyenv/versions/3.5.2/lib/python3.5/http/client.py", line
478, in readinto
return self._readinto_chunked(b)
File
"/home/vbanos/.pyenv/versions/3.5.2/lib/python3.5/http/client.py", line
589, in _readinto_chunked
raise IncompleteRead(bytes(b[0:total_bytes]))
http.client.IncompleteRead: IncompleteRead(146 bytes read)
In this PR, we add exception handling for http.client.IncompleteRead
aiming to continue the request when it happens.
curl now behaves exactly the same with or without using warcprox.
export http_proxy=http://localhost:8888/; curl -X GET
http://www.alumni.weber.edu/
Some times, due to an HTTP Server problem,
http.client
raises anIncompleteRead
exception and fails while reading data from a target URL. Example: http://www.alumni.weber.edu/curl
and browsers work correctly with the same URL.Notice there is a warning but the page is downloaded correctly.
The
warcprox
exception when trying to download http://www.alumni.weber.edu/ is:In this PR, we add exception handling for
http.client.IncompleteRead
aiming to continue the request when it happens.curl
now behaves exactly the same with or without using warcprox.