urllib3 / urllib3

urllib3 is a user-friendly HTTP client library for Python
https://urllib3.readthedocs.io
MIT License
3.79k stars 1.15k forks source link

ProtocolError spans connection, read errors #470

Open kevinburke opened 10 years ago

kevinburke commented 10 years ago

For retry purposes it's grouped as a "read" in _is_read_error but most types of errors are connection errors.

ProtocolError encompasses SocketErrors (DNS lookup failures, EADDRINUSE, ECONNRESET, ECONNREFUSED, etc), as well as the following exceptions in httplib (taken from CPython 2.7)

class HTTPException(Exception):
class NotConnected(HTTPException):
class InvalidURL(HTTPException):
class UnknownProtocol(HTTPException):
class UnknownTransferEncoding(HTTPException):
class UnimplementedFileMode(HTTPException):
class IncompleteRead(HTTPException):
class ImproperConnectionState(HTTPException):
class BadStatusLine(HTTPException):
class LineTooLong(HTTPException):

Some of these are raised before the connection is established and some aren't. I'm not sure catching them all as read errors is the best approach. It is "simpler" for some degree of simpler.

If you are looking for a recommendation I would recommend moving the httplib exceptions that happen after data has been written to the socket (ImproperConnectionState, BadStatusLine, LineTooLong, IncompleteRead) to be instances of RequestError, and then move ProtocolError to _is_connection_error in retry.py.

kevinburke commented 10 years ago

FWIW, requests also catches a ProtocolError and reraises it as a ConnectionError.

shazow commented 10 years ago

Last I looked into it[*], I remember that a bunch of them (if not all) were very ambiguous and could happen either before or after the server has received the request, so grouping them under ProtocolError was the safest thing to do (assume the server received it just in case). :/

[*] That is actually read the httplib code and trying to decipher what it's doing.

kevinburke commented 10 years ago

Previously (at least by the naming standard, maybe not by behaviour) the error was called ConnectionError and now it covers both connection errors and post-processing errors and this feels like a step backwards.

shazow commented 10 years ago

Yea, except what we had before was incorrect as far as I could tell. :)

kevinburke commented 10 years ago

Agreed! I would love to find a way forward where urllib3 has some kind of error which we can guarantee was raised before the request has been sent or partially sent.

shazow commented 10 years ago

This will involve replacing httplib.

Lukasa commented 10 years ago

This is now the 100th time we've said that. =( (Probably)

shazow commented 10 years ago

Closing as dupe of #58, feel free to reopen if you have better ideas. :)

kevinburke commented 10 years ago

So... I'm not sure we actually have to replace httplib to accomplish this. The socket connect, read, and write calls are done separately. We'd just have to catch more exceptions individually and raise them appropriately.

I wrote a little bit about what my ideal exception hierarchy would look like, please take a look:

https://gist.github.com/kevinburke/b98e053a4bf9835c67bb

kevinburke commented 10 years ago

Also, apparently I can't reopen this issue :)