Open 418b8850-85af-4077-bd06-cae17fd9e37f opened 14 years ago
Since Python 2.6, httplib has offered a timeout parameter for fetches. As the documentation explains, if this parameter is not provided, it uses the global default.
What the document doesn't explain is httplib builds on top of the socket library. The socket library has a default timeout of None (i.e. forever). This may be an appropriate default for general sockets, but it is a poor default for httplib; typical http clients would use a timeout in the 2-10 second range.
This problem is propagated up to urllib2, which sits on httplib, and further obscures that the default might be unsuitable.
From an inspection of the manuals, Python 3.0.1 suffers from the same problem except, the names have changed. urllib.response sits on http.client.
I, for one, made a brutal mistake of assuming that the "global default" would be some reasonable default for fetching web pages; I didn't have any specific timeout in mind, and was happy for the library to take care of it. Several million successful http downloads later, my server application thread froze waiting forever when talking to a recalcitrant web-server. I imagine others have fallen for the same trap.
While an ideal solution would be for httplib and http.client to use a more generally acceptable default, I can see it might be far too late to make such a change without breaking existing applications. Failing that, I would recommend that the documentation for httplib, urllib, urllib2, http.client and urllib.request (+ any other similar libraries sitting on socket? FTP, SMTP?) be changed to highlight that the default global timeout, sans deliberate override, is to wait a surprisingly long time.
I am not sure, there can be a default timeout value for client libraries like httplib and urllib2.
Socket connection do have timeout and as you may have figured out already, the option in httplib and urllib methods is to set/override the socket._GLOBAL_DEFALT_TIMEOUT which is None by default (Wait indefinitely).
Since client libraries are using a global, setting it at once place (say at httplib) has same timeout applicable for other modules within the same process.
I see docs can highlight it more or perhaps link to sockets timeout information.
@orsenthil:
Consider the definition of httplib.HTTPConnection.__init__(), in Python 2.6.
def __init__(self, host, port=None, strict=None,
timeout=socket._GLOBAL_DEFAULT_TIMEOUT):
This could be replaced with:
def __init__(self, host, port=None, strict=None,
timeout=10):
or, perhaps better,
def __init__(self, host, port=None, strict=None,
timeout=httplib._HTTP_DEFAULT_TIMEOUT):
This timeout value is passed to the call in socket.create_connection, so I believe if it is overriden, it only applies to the relevant sockets and not to all sockets globally.
Note: I am not arguing here that this SHOULD be done - it would break existing applications, especially those that were written before Python 2.6 - merely that it COULD be done.
On Sun, May 02, 2010 at 03:45:09AM +0000, Julian wrote:
Note: I am not arguing here that this SHOULD be done - it would break existing applications, especially those that were written before Python 2.6 - merely that it COULD be done.
I get your point, Julian. What I was worried about is, is it the "correct thing" to do? Which I am not sure and I believe it is not as httplib and urllib are not client themselves but are libraries to build clients. httplib and urllib can be considered as convenient interfaces over underlying sockets.
Breaking of existing apps is the next question, which should definitely be avoided. And an explanation in the docs certainly seems to be the way to go.
I think you could preserve backward compatibility by doing something like the following (in httplib):
_sentinel = object()
__HTTP_DEFAULT_TIMEOUT = _sentinel
In httplib.HTTPConnection.__init__(), in Python 2.6.
def __init__(self, host, port=None, strict=None,
timeout=None):
if timeout is None:
if _HTTP_DEFAULT_TIMEOUT is _sentinel:
timeout = socket._GLOBAL_DEFAULT_TIMEOUT
else:
timeout = _HTTP_DEFAULT_TIMEOUT
That way, if _HTTP_DEFAULT_TIMEOUT is never set, it will use the the socket timeout. Admittedly I'd rather see all uses of module globals go away, but I think this would be a good compromise.
That way, if _HTTP_DEFAULT_TIMEOUT is never set, it will use the the socket timeout. Admittedly I'd rather see all uses of module globals go away, but I think this would be a good compromise.
Why not provide {httplib,urllib}.{set,get}defaulttimeout() instead?
On 8/19/2010 9:14 AM, Antoine Pitrou wrote:
Why not provide {httplib,urllib}.{set,get}defaulttimeout() instead?
Yes, I'm assuming that's how _HTTP_DEFAULT_TIMEOUT would be set and queried.
I've put up the doc update PR: https://github.com/python/cpython/pull/27087
I did not add a module level default timeouts to these modules because it seems to me most of the issue is users being unaware of the current global default, which is fixed by docs update; then it's easy to set the timeout when calling respective functions.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = 'https://github.com/orsenthil' closed_at = None created_at =
labels = ['type-bug', 'library', '3.10', 'docs']
title = 'Explain the default timeout in http-client-related libraries'
updated_at =
user = 'https://bugs.python.org/oddthinking'
```
bugs.python.org fields:
```python
activity =
actor = 'andrei.avk'
assignee = 'orsenthil'
closed = False
closed_date = None
closer = None
components = ['Documentation', 'Library (Lib)']
creation =
creator = 'oddthinking'
dependencies = []
files = []
hgrepos = []
issue_num = 8595
keywords = ['patch']
message_count = 8.0
messages = ['104763', '104765', '104767', '104769', '109052', '114361', '114362', '397263']
nosy_count = 7.0
nosy_names = ['orsenthil', 'pitrou', 'eric.smith', 'docs@python', 'oddthinking', 'asandvig', 'andrei.avk']
pr_nums = ['27087']
priority = 'normal'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue8595'
versions = ['Python 3.10']
```