python / cpython

The Python programming language
https://www.python.org
Other
62.92k stars 30.14k forks source link

Cannot override 'connection: close' in urllib2 headers #57058

Open 6749df30-51de-4153-a869-24ef2f2a26e7 opened 13 years ago

6749df30-51de-4153-a869-24ef2f2a26e7 commented 13 years ago
BPO 12849
Nosy @jcea, @orsenthil, @bitdancer, @vadmium

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['invalid', 'type-bug', '3.7', 'expert-IO'] title = "Cannot override 'connection: close' in urllib2 headers" updated_at = user = 'https://bugs.python.org/shubhojeetghosh' ``` bugs.python.org fields: ```python activity = actor = 'henrik242' assignee = 'none' closed = False closed_date = None closer = None components = ['IO'] creation = creator = 'shubhojeet.ghosh' dependencies = [] files = [] hgrepos = [] issue_num = 12849 keywords = [] message_count = 8.0 messages = ['143120', '170476', '211387', '221006', '243879', '250285', '363544', '363697'] nosy_count = 8.0 nosy_names = ['jcea', 'orsenthil', 'r.david.murray', 'martin.panter', 'shubhojeet.ghosh', 'sanxiago', 's7v7nislands@gmail.com', 'henrik242'] pr_nums = [] priority = 'normal' resolution = 'not a bug' stage = None status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue12849' versions = ['Python 2.7', 'Python 3.2', 'Python 3.3', 'Python 3.7'] ```

6749df30-51de-4153-a869-24ef2f2a26e7 commented 13 years ago

There seems to be an issue with urllib2 The headers defined does not match with the physical data packet (from wireshark). Other header parameters such as User Agent, cookie works fine. Here is an example of a failure:

Python Code: import urllib2

url = "http://www.python.org"

req = urllib2.Request(url)
req.add_header('Connection',"keep-alive")
u = urllib2.urlopen(req)

Wireshark: GET / HTTP/1.1

Accept-Encoding: identity

Connection: close

Host: www.python.org

User-Agent: Python-urllib/2.6

bitdancer commented 12 years ago

I've closed bpo-15943 as a duplicate of this one. As I said there, I'm not sure that we (can?) support keep-alive in urllib, though we do in httplib (which is the http package in python3).

vadmium commented 10 years ago

I suggest using setdefault() in urllib.request.AbstractHTTPHandler.do_open():

    headers.setdefault("Connection", "close")

I am trying to work around a server that truncates its response when this header is sent, and this change would allow me to specify headers={"Connection", "Keep-Alive"} to get the same effect as dropping the Connection header. This is also consistent with the way the other headers (Accept-Encoding, User-Agent, Host) may be overridden.

bb8bd63d-cf82-41f3-a63e-9703d695cb16 commented 10 years ago

The problem here as far as I can tell is that the underlying file object (addinfourl) blocks while waiting for a full response from the server. As detailed in section 8.1 of RFC 2616, requests and responses can be pipelined, meaning requests can be sent while waiting for full responses from a server.

The suggested change of overriding headers is only a partial solution as it doesn't allow for non-blocking pipelining.

@Martin Panter: My suggestion for you would simply be to use http.client (httplib) as R. David Murray suggests, which doesn't auto-inject the Connection header. Also, a server truncating responses when "Connection: close" is sent sounds like a server-side bug to me. Unless you're a server maintainer (or have access to the developers), have you tried reaching out to them to request a fix?

vadmium commented 9 years ago

So far the only reasons that have been given to override this header (mine and the one in bpo-15943) seem to be to work around buggy servers. It is already documented that HTTP 1.1 and “Connection: close” are used, so if this issue is only about working around buggy servers, the best thing might be to close this as being “not a Python bug”. The user can always still use the low-level HTTP client, or make a custom urllib.request handler class (which is what I did).

Shubhojeet: What was the reason you wanted to set a keep-alive header?

If this is about proper keep-alive (a.k.a persistent) connection support in urllib.request, perhaps have a look at bpo-9740.

vadmium commented 9 years ago

Just closed bpo-25037 about a server that omits the chunk length headers when “Connection: closed” is used.

I wonder if it would be such a bad idea to just remove the “Connection: closed” flag. It was added in 2004 in revision 5e7455fb8db6, but I do not agree with the reason given in the commit message and comment. Adding the flag is only really a courtesy to the server, saying it can drop the connection once it sends the response. Removing it in theory shouldn’t change anything about how the client parses the HTTP response, but in practice it seems it may improve compatibility with buggy servers.

e8c83bbc-3416-47d2-b3b4-8a3d6e3b373e commented 4 years ago

That mandatory "Connection: close" makes it impossible to POST a data request to Solr, as described in https://bugs.python.org/issue39875

It would be very helpful if it could be made optional.

e8c83bbc-3416-47d2-b3b4-8a3d6e3b373e commented 4 years ago

Correction: My problem in bpo-39875 was not related to Connection: Close, but with weird POST handling in Solr.