orientchen / httplib2

Automatically exported from code.google.com/p/httplib2
0 stars 0 forks source link

Retrieving Yahoo OpenID page results in incomplete body entitity #18

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
>>> from httplib2 import Http
>>> http = Http()
>>> http.request('https://me.yahoo.com/christophermlenz')
({'status': '200', 'content-location': 'https://me.yahoo.com/christophermlenz', 
'transfer-
encoding': 'chunked', 'connection': 'close', 'date': 'Thu, 28 Feb 2008 10:19:53 
GMT', 'p3p': 
'policyref="http://p3p.yahoo.com/w3c/p3p.xml", CP="CAO DSP COR CUR ADM DEV TAI 
PSA PSD 
IVAi IVDi CONi TELo OTPi OUR DELi SAMi OTRi UNRi PUBi IND PHY ONL UNI PUR FIN 
COM NAV INT 
DEM CNT STA POL HEA PRE GOV"', 'content-type': 'text/html; charset=utf-8'}, 
'<!-- 
oid03.member.re3.yahoo.com uncompressed/chunked Thu Feb 28 02:19:53 PST 2008 
-->\n')

Note that the body only contains a trailing comment of the page. The URL does 
however contain 
quite a bit of HTML markup before that comment. Maybe something with the 
declared chunked 
encoding??

Tried with both the 0.4.0 release and trunk, and both Python 2.4 and 2.5.

Original issue reported on code.google.com by cmlenz on 28 Feb 2008 at 10:23

GoogleCodeExporter commented 8 years ago
Wow, this one took me a while to track down, and I'm relieved to tell you it's 
not a
problem with httplib2. The issue appears to be that the yahoo server is 
confused by
the Host: header that httplib2 sends. Httplib2 sends the port along with the 
host,
which is allowed by RFC 2616,  but the yahoo site only sends back the full HTML
response if the port is not supplied in the request. You can see this in action 
by
overriding the host header and getting the full response:

import httplib2

h = httplib2.Http()
h, b = h.request('http://me.yahoo.com/christophermlenz', headers = {'host':
'me.yahoo.com'})
print b

Original comment by joe.gregorio@gmail.com on 5 Sep 2008 at 2:34

GoogleCodeExporter commented 8 years ago
Thanks for tracking this down! I wonder whether it'd be a good idea to omit the 
port by default if it's not 
explicitly specified in the request URI.

Original comment by cml...@gmx.de on 5 Sep 2008 at 10:57