python / cpython

The Python programming language
https://www.python.org
Other
63.46k stars 30.39k forks source link

Explain the default timeout in http-client-related libraries #52841

Open 418b8850-85af-4077-bd06-cae17fd9e37f opened 14 years ago

418b8850-85af-4077-bd06-cae17fd9e37f commented 14 years ago
BPO 8595
Nosy @orsenthil, @pitrou, @ericvsmith, @akulakov
PRs
  • python/cpython#27087
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = 'https://github.com/orsenthil' closed_at = None created_at = labels = ['type-bug', 'library', '3.10', 'docs'] title = 'Explain the default timeout in http-client-related libraries' updated_at = user = 'https://bugs.python.org/oddthinking' ``` bugs.python.org fields: ```python activity = actor = 'andrei.avk' assignee = 'orsenthil' closed = False closed_date = None closer = None components = ['Documentation', 'Library (Lib)'] creation = creator = 'oddthinking' dependencies = [] files = [] hgrepos = [] issue_num = 8595 keywords = ['patch'] message_count = 8.0 messages = ['104763', '104765', '104767', '104769', '109052', '114361', '114362', '397263'] nosy_count = 7.0 nosy_names = ['orsenthil', 'pitrou', 'eric.smith', 'docs@python', 'oddthinking', 'asandvig', 'andrei.avk'] pr_nums = ['27087'] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue8595' versions = ['Python 3.10'] ```

    418b8850-85af-4077-bd06-cae17fd9e37f commented 14 years ago

    Since Python 2.6, httplib has offered a timeout parameter for fetches. As the documentation explains, if this parameter is not provided, it uses the global default.

    What the document doesn't explain is httplib builds on top of the socket library. The socket library has a default timeout of None (i.e. forever). This may be an appropriate default for general sockets, but it is a poor default for httplib; typical http clients would use a timeout in the 2-10 second range.

    This problem is propagated up to urllib2, which sits on httplib, and further obscures that the default might be unsuitable.

    From an inspection of the manuals, Python 3.0.1 suffers from the same problem except, the names have changed. urllib.response sits on http.client.

    I, for one, made a brutal mistake of assuming that the "global default" would be some reasonable default for fetching web pages; I didn't have any specific timeout in mind, and was happy for the library to take care of it. Several million successful http downloads later, my server application thread froze waiting forever when talking to a recalcitrant web-server. I imagine others have fallen for the same trap.

    While an ideal solution would be for httplib and http.client to use a more generally acceptable default, I can see it might be far too late to make such a change without breaking existing applications. Failing that, I would recommend that the documentation for httplib, urllib, urllib2, http.client and urllib.request (+ any other similar libraries sitting on socket? FTP, SMTP?) be changed to highlight that the default global timeout, sans deliberate override, is to wait a surprisingly long time.

    orsenthil commented 14 years ago

    I am not sure, there can be a default timeout value for client libraries like httplib and urllib2.

    Socket connection do have timeout and as you may have figured out already, the option in httplib and urllib methods is to set/override the socket._GLOBAL_DEFALT_TIMEOUT which is None by default (Wait indefinitely).

    Since client libraries are using a global, setting it at once place (say at httplib) has same timeout applicable for other modules within the same process.

    I see docs can highlight it more or perhaps link to sockets timeout information.

    418b8850-85af-4077-bd06-cae17fd9e37f commented 14 years ago

    @orsenthil:

    Consider the definition of httplib.HTTPConnection.__init__(), in Python 2.6.

       def __init__(self, host, port=None, strict=None,
                    timeout=socket._GLOBAL_DEFAULT_TIMEOUT):

    This could be replaced with:

       def __init__(self, host, port=None, strict=None,
                    timeout=10):

    or, perhaps better,

       def __init__(self, host, port=None, strict=None,
                    timeout=httplib._HTTP_DEFAULT_TIMEOUT):

    This timeout value is passed to the call in socket.create_connection, so I believe if it is overriden, it only applies to the relevant sockets and not to all sockets globally.

    Note: I am not arguing here that this SHOULD be done - it would break existing applications, especially those that were written before Python 2.6 - merely that it COULD be done.

    orsenthil commented 14 years ago

    On Sun, May 02, 2010 at 03:45:09AM +0000, Julian wrote:

    Note: I am not arguing here that this SHOULD be done - it would break existing applications, especially those that were written before Python 2.6 - merely that it COULD be done.

    I get your point, Julian. What I was worried about is, is it the "correct thing" to do? Which I am not sure and I believe it is not as httplib and urllib are not client themselves but are libraries to build clients. httplib and urllib can be considered as convenient interfaces over underlying sockets.

    Breaking of existing apps is the next question, which should definitely be avoided. And an explanation in the docs certainly seems to be the way to go.

    ericvsmith commented 14 years ago

    I think you could preserve backward compatibility by doing something like the following (in httplib):

    _sentinel = object()
    __HTTP_DEFAULT_TIMEOUT = _sentinel

    In httplib.HTTPConnection.__init__(), in Python 2.6.

       def __init__(self, host, port=None, strict=None,
                    timeout=None):
          if timeout is None:
             if _HTTP_DEFAULT_TIMEOUT is _sentinel:
                timeout = socket._GLOBAL_DEFAULT_TIMEOUT
             else:
                timeout = _HTTP_DEFAULT_TIMEOUT

    That way, if _HTTP_DEFAULT_TIMEOUT is never set, it will use the the socket timeout. Admittedly I'd rather see all uses of module globals go away, but I think this would be a good compromise.

    pitrou commented 14 years ago

    That way, if _HTTP_DEFAULT_TIMEOUT is never set, it will use the the socket timeout. Admittedly I'd rather see all uses of module globals go away, but I think this would be a good compromise.

    Why not provide {httplib,urllib}.{set,get}defaulttimeout() instead?

    ericvsmith commented 14 years ago

    On 8/19/2010 9:14 AM, Antoine Pitrou wrote:

    Why not provide {httplib,urllib}.{set,get}defaulttimeout() instead?

    Yes, I'm assuming that's how _HTTP_DEFAULT_TIMEOUT would be set and queried.

    akulakov commented 3 years ago

    I've put up the doc update PR: https://github.com/python/cpython/pull/27087

    I did not add a module level default timeouts to these modules because it seems to me most of the issue is users being unaware of the current global default, which is fixed by docs update; then it's easy to set the timeout when calling respective functions.