Closed Blizz8975 closed 9 years ago
Hmm, I can't test this because I keep getting 504s. Are you sure they sent a complete response?
I'm not exactly sure what you mean by a complete response, can you tell me how can I verify this? :)
Yeah, that's a bit tricky to verify. It would help to see the response headers if you can print them out. That way I can check whether this is prone to truncated responses, at the very least.
Does this help? import urllib3 http = urllib3.PoolManager() r = http.request('GET', 'http://example.com/') r.headers['server'] ==> 'ECS (mdw/1275)'
My site gives this: import urllib3 http = urllib3.PoolManager() r = http.request('GET', 'http://holytrinityhs.echalk.com/site_res_view_photoalbum.aspx?resourceId=78224c68-7155-4b2e-999c-cc9abf549f2b') r.status 200 r.headers['server'] 'Microsoft-IIS/6.0'
Sorry, I'd like to see all the headers.
How about this? (from urllib3) HTTPHeaderDict({'Server': 'Microsoft-IIS/6.0', 'X-Powered-By': 'ASP.NET', 'Date': 'Wed, 03 Jun 2015 16:10:14 GMT', 'X-AspNet-Version': '4.0.30319', 'PICS-Label': '(PICS-1.1 "http://www.rsac.org/ratingsv01.html" l by "support@echalk.com" on "2005.04.14T14:34-0400" exp "2008.04.18T12:00-0400" r (n 0 s 0 v 0 l 0 oa 0 ob 0 oc 0 od 0 oe 0 of 0 og 0 oh 0 c 0)), (PICS-1.1 "http://www.rsac.org/ratingsv01.html" l by "support@echalk.com" on "2005.04.14T14:34-0400" exp "2008.04.18T12:00-0400" r (n 0 s 0 v 0 l 0 oa 0 ob 0 oc 0 od 0 oe 0 of 0 og 0 oh 0 c 0))(PICS-1.0 "http://www.rsac.org/ratingsv01.html" l by "support@echalk.com" on "2005.04.14T14:34-0400" exp "2008.04.18T12:00-0400" r (v 0 s 0 n 0 l 0)), (PICS-1.1 "http://www.rsac.org/ratingsv01.html" l by "support@echalk.com" on "2005.04.14T14:34-0400" exp "2008.04.18T12:00-0400" r (n 0 s 0 v 0 l 0 oa 0 ob 0 oc 0 od 0 oe 0 of 0 og 0 oh 0 c 0))(PICS-1.0 "http://www.rsac.org/ratingsv01.html" l by "support@echalk.com" on "2005.04.14T14:34-0400" exp "2008.04.18T12:00-0400" r (v 0 s 0 n 0 l 0))(PICS-1.1 "http://www.rsac.org/ratingsv01.html" l by "support@echalk.com" on "2005.04.14T14:34-0400" exp "2008.04.18T12:00-0400" r (l 0 s 0 v 0 o 0))', 'Cache-Control': 'private', 'Content-Type': 'text/html; charset=Windows-1252', 'Content-Length': '304', 'Set-Cookie': 'WebHostServer=W09ECNJ; path=/'})
This is the header I get from using requests: {'cache-control': 'private', 'x-aspnet-version': '4.0.30319', 'set-cookie': 'WebHostServer=W07ECNJ; path=/', 'date': 'Wed, 03 Jun 2015 16:16:16 GMT', 'x-powered-by': 'ASP.NET', 'content-type': 'text/html; charset=Windows-1252', 'content-encoding': 'gzip', 'content-length': '370', 'server': 'Microsoft-IIS/6.0', 'pics-label': '(PICS-1.1 "http://www.rsac.org/ratingsv01.html" l by "support@echalk.com" on "2005.04.14T14:34-0400" exp "2008.04.18T12:00-0400" r (n 0 s 0 v 0 l 0 oa 0 ob 0 oc 0 od 0 oe 0 of 0 og 0 oh 0 c 0)), (PICS-1.1 "http://www.rsac.org/ratingsv01.html" l by "support@echalk.com" on "2005.04.14T14:34-0400" exp "2008.04.18T12:00-0400" r (n 0 s 0 v 0 l 0 oa 0 ob 0 oc 0 od 0 oe 0 of 0 og 0 oh 0 c 0))(PICS-1.0 "http://www.rsac.org/ratingsv01.html" l by "support@echalk.com" on "2005.04.14T14:34-0400" exp "2008.04.18T12:00-0400" r (v 0 s 0 n 0 l 0)), (PICS-1.1 "http://www.rsac.org/ratingsv01.html" l by "support@echalk.com" on "2005.04.14T14:34-0400" exp "2008.04.18T12:00-0400" r (n 0 s 0 v 0 l 0 oa 0 ob 0 oc 0 od 0 oe 0 of 0 og 0 oh 0 c 0))(PICS-1.0 "http://www.rsac.org/ratingsv01.html" l by "support@echalk.com" on "2005.04.14T14:34-0400" exp "2008.04.18T12:00-0400" r (v 0 s 0 n 0 l 0))(PICS-1.1 "http://www.rsac.org/ratingsv01.html" l by "support@echalk.com" on "2005.04.14T14:34-0400" exp "2008.04.18T12:00-0400" r (l 0 s 0 v 0 o 0))'}
The cleaned up version:
{'Cache-Control': 'private',
'Content-Length': '304',
'Content-Type': 'text/html; charset=Windows-1252',
'Date': 'Wed, 03 Jun 2015 16:10:14 GMT',
'PICS-Label': '(PICS-1.1 "http://www.rsac.org/ratingsv01.html" l by "support@echalk.com" on "2005.04.14T14:34-0400" exp "2008.04.18T12:00-0400" r (n 0 s 0 v 0 l 0 oa 0 ob 0 oc 0 od 0 oe 0 of 0 og 0 oh 0 c 0)), (PICS-1.1 "http://www.rsac.org/ratingsv01.html" l by "support@echalk.com" on "2005.04.14T14:34-0400" exp "2008.04.18T12:00-0400" r (n 0 s 0 v 0 l 0 oa 0 ob 0 oc 0 od 0 oe 0 of 0 og 0 oh 0 c 0))(PICS-1.0 "http://www.rsac.org/ratingsv01.html" l by "support@echalk.com" on "2005.04.14T14:34-0400" exp "2008.04.18T12:00-0400" r (v 0 s 0 n 0 l 0)), (PICS-1.1 "http://www.rsac.org/ratingsv01.html" l by "support@echalk.com" on "2005.04.14T14:34-0400" exp "2008.04.18T12:00-0400" r (n 0 s 0 v 0 l 0 oa 0 ob 0 oc 0 od 0 oe 0 of 0 og 0 oh 0 c 0))(PICS-1.0 "http://www.rsac.org/ratingsv01.html" l by "support@echalk.com" on "2005.04.14T14:34-0400" exp "2008.04.18T12:00-0400" r (v 0 s 0 n 0 l 0))(PICS-1.1 "http://www.rsac.org/ratingsv01.html" l by "support@echalk.com" on "2005.04.14T14:34-0400" exp "2008.04.18T12:00-0400" r (l 0 s 0 v 0 o 0))',
'Server': 'Microsoft-IIS/6.0',
'Set-Cookie': 'WebHostServer=W09ECNJ; path=/',
'X-AspNet-Version': '4.0.30319',
'X-Powered-By': 'ASP.NET'}
So the content-length header there is 304 bytes. That seems about right, so we haven't missed any HTML. It suggests that you're not making quite the same request your browser is. Do you know how to use your browser development tools?
I think so, the entire html should give this: "
<!--[if lt IE 8]>
<![endif]-->
Sorry, what I want you to do is use your developer tools to see what web request your browser is making. I suspect you need some cookies you don't have.
My bad :) Do you mean the information under cookies in the resources tab?
Nvm found the problem
Thanks for everthing!
Hey guys! I'm currently working on some content migration and I can't seem to pull the entire html source code using Python.
Here is the on of the pages I'm working on: http://holytrinityhs.echalk.com/site_res_view_photoalbum.aspx?resourceId=0b744865-ad8a-4e76-8d42-15966cd7c4e2
So by using: html = requests.get("http://holytrinityhs.echalk.com/site_res_view_photoalbum.aspx?resourceId=0b744865-ad8a-4e76-8d42-15966cd7c4e2") and the calling: html.text gives me
which is not the full html source code.
Any help would be very much appreciated!
Thanks!