pwsm / httplib2

Automatically exported from code.google.com/p/httplib2
0 stars 0 forks source link

python3 httplib2 clobbers multiple headers of same key #229

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Using httplib2 v 0.7.6.

What steps will reproduce the problem?
1. Have a server produce a response with multiple headers of the same name, for 
example:

    Cache-control: max-age=3000
    Cache-control: no-transform

2. Request the url that produces that response with httplib2 and python3

What is the expected output? What do you see instead?

Expected output is response['cache-control'] == "max-age=3000, no-transform"

Output received is response['cache-control'] == "no-transform"

This is happening because of http.client (python3) having different header 
parsing behavior from httplib (python2). httplib does its own header parsing 
and appends the info from duplicate headers (method addheader in class 
HTTPMessage (httplib.py line 220 in python 2.7.2). Thus the handling in 
httplib2's Response class when calling info.getheaders() works out okay.

In python3 info.getheaders() will return multiples of the same key, and in 
httplib2 the last one will win.

Presumably the fix to ameliorate this problem is to check the Response dict for 
the key already being there, and append the new data, as done in httplib.

I expect I can cook up a reasonable patch for such things, but I first wanted 
to confirm that this is considered a bug in httplib2 and not in http.client and 
to farm for this important piece of information:

httplib blithely decides that all headers can have the append after ', ' thing 
happening, but I'm relatively certain that only a subset of response headers do 
that, notably Set-Cookie and Cache-Control. Is it best to guard for just those 
that are allowed? If so, which ones are? Building in a whitelist or blacklist 
seems fragile for the future.

Opinions?

Original issue reported on code.google.com by chris.d...@gmail.com on 3 Oct 2012 at 4:45

GoogleCodeExporter commented 8 years ago
The reason this problem isn't showing up via 
test/duplicate-headers/multilink.asis is because, as far as I can tell, nginx 
is concatenating the Link headers before sending a response. A raw telnet to 
port 80 getting that returns:

Link: <http://bitworking.org>; rel="home"; title="BitWorking", 
<http://bitworking.org/index.rss>; rel="feed"; title="BitWorking"

So an additional test will be required, which I'm poking at now.

Original comment by chris.d...@gmail.com on 8 Oct 2012 at 4:01

GoogleCodeExporter commented 8 years ago
Meh, tests need something like wsgi-intercept, see 
http://code.google.com/p/httplib2/issues/detail?id=84

In any case a diff with a simple fix without tests is attached. It is based on 
the code found in the python2.7 httplib.

Original comment by chris.d...@gmail.com on 8 Oct 2012 at 7:55

Attachments:

GoogleCodeExporter commented 8 years ago
Fixed in 
http://code.google.com/p/httplib2/source/detail?r=4dd0d6cc00c16caa00dbc5af29d160
0aaf523c94

Original comment by joe.gregorio@gmail.com on 12 Nov 2012 at 6:56