Closed arunchandramouli closed 7 years ago
@arunchandramouli have you considered setting a timeout
?
Beyond that, I'd advise you to not ask questions on a defect (issue) tracker. Instead, ask questions on StackOverflow.
Yes I did timeout and it returns as 403/404 but not 200. I did see through SO, but didnt get similar query. I would like to know why a few sites don't return at all, but request.get() keep running infinitely
@arunchandramouli There are two possibilities. Either 1) the network has hung, in which case a timeout will help because it'll throw an exception; or 2) the response is infinite in size. If the response is infinite in size and you don't use the stream=True
parameter, Requests will attempt to read the entire response body. That obviously won't work. Eventually, such a use of Requests will cause you to run out of memory.
@Lukasa - Great ! but when I did telnet proxy followed by get urlname, it actually returns me the content.it works that way. But I tried using requests.get(), urllib.urlopen() and httplib, it all fails, sounds a bit interesting.
@arunchandramouli That sounds very much like the proxy is misconfigured. Can you show me the complete response you got from the proxy, including the headers? The telnet request you sent would also be good.
@Lukasa The Proxy / Normal request doesn't request anything at all , it keeps hanging unless I issue a timeout. Where as Telnet GET returns the url content. I couldn't share due to privacy and client protocol issues.
@arunchandramouli If you can't share the data then I'm afraid we're essentially unable to help you. The best I can tell you is that the response is almost certainly ill-formed, which is why none of the Python HTTP libraries you've tried can parse it.
I got it, that might the case. But I could see the responses in FF and Chrome.
That means very little. Servers and proxies can mutate responses based on various properties of the request, and additionally browsers have very lenient parsers.
But again, we cannot help you if we cannot get more information about the response. Listing everything that can parse the response is not helpful, because we cannot turn that into a matrix of things we would need to change. Either we need to be able to reproduce this, or you're going to need to investigate without our assistance.
will try to share.. But there is no response to share at all, all i gt is a timeout message, if I had given timeout else it keeps hanging.....
@arunchandramouli Can you provide a stacktrace for your timeout?
@Lukasa requests.exceptions.ReadTimeOut:HTTPSConnectionPool(host='***',port=443).Read Timed out.
api.py -> sessions.py -> adapters.py (normal flow for HTTP/HTTPS connection) I couldn't copy the trace from the server
I need the complete traceback. I need to know where we are. If I can't get function names and line numbers then I can't help you.
let me tell you the flow; api.py(line 68) --> api.py(line 50) -->sessions.py(line 464) --> sessions.py(line 576) --> adpaters.py(line 433)
there in function send, I get raise readTimeout
What version of Requests are you using?
2.6.0
Unfortunately, that traceback on that version is just not informative enough. Have you tried upgrading your Requests version?
(The main problem here is that the traceback just points to where we threw our wrapper exception, and so without more detail (e.g. the complete text of the exception message) I can't work out where the inner exception came from.)
I cant upgrade.. client permission restrictions.. but we use Anaconda to run d code too.. I tried via Interpreter and Anaconda, still fails at same point....
refer https://github.com/kennethreitz/requests/blob/master/requests/adapters.py line 498 , class HTTPAdapted, function send
elif isinstance(e, ReadTimeoutError): raise ReadTimeout(e, request=request)
Yeah, so as discussed, without being able to see the full text of the exception I can't be more helpful. It is a wrapper exception, so it contains an exception within it that may be a bit clearer.
Something very similar to this, I get exception as same, but Readtimeout
Traceback (most recent call last):
File "
@arunchandramouli I care mostly about the bit at the end, where we have all this:
requests.exceptions.ConnectionError: HTTPConnectionPool(host='www.somebadurl.com', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<requests.packages.urll onnection object at 0x0000000003E2E1D0>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))
I need to know what you see in that section.
In the sample case that i tried, I had shared you the full ST. But in realtime client server;
I get requests.exceptions.ReadTimeout:HTTPSConnectionPool(host="*",port=443):Read timed out. (read time out = 10)
So, we can conclude that in real time, I dont get connection error but timeout(if I had specified it as a Param) else it continues to poll the request.get() infinitely
Yup, so the question is going to be: where is that read timeout coming from? It can come in two places. One real possibility is that the response is simply not arriving on time. This can happen, particularly with proxies involved. In that case, the simplest thing to do is to just try again.
Proxies didnt help either.. I tried a simple requests.get(URL,timeout=10) .. now i am trying with timeout as 500 secs.. but requests module in real time is so good that it returns in ms..
timeout of 500 also failed
tried using urllib and urllib2... they hang at this call; req = urllib2.Request('*') - creates an urllib2.Request instance but fails on calls made to addinfourl here; response = urllib2.urlopen(req)
Refer ; https://github.com/python-git/python/blob/master/Lib/urllib.py ; line 965;
class addinfourl(addbase): """class to add info() and geturl() methods to an open file."""
def __init__(self, fp, headers, url, code=None):
addbase.__init__(self, fp)
self.headers = headers
self.url = url
self.code = code
def info(self):
return self.headers
def getcode(self):
return self.code
def geturl(self):
return self.url
Unfortunately, none of this detail really helps to narrow anything down. We need to see what's coming down the wire.
What more info you want? Can you please specify?
I need to see where the read timeout is occurring: whether it's occurring in the read of the body or of the headers. Can you set stream=True
and tell me if you get the timeout in the call to requests.get
or when you access the body content?
when I set stream=True, timeout=20. It still gives me ReadTimeOut
.get() fails yet
So if get
fails, that means that the response headers are not being completely received. Either no data is being received, or the data is malformed and Requests is still expecting more data. Either way, without being able to see any better into the data, your best option is simply to retry.
requests.head(URL) also runs infinitely
Response and Request headers + contents is viewed good @ firefox and chrome
As I've mentioned before, the fact that Firefox and Chrome can see it is not helpful. Either the content is malformed, or it is not appearing, and the server may well be serving different data to Requests than to FF or Chrome.
yes True, but from my side what more info you need?Please suggest
This conversation is now going in circles. To prevent further noise (and little signal) being delivered to roughly 1000 people, I'm going to lock this conversation. @arunchandramouli we've given you sufficient amounts of detail that we require and you consistently ignore it. If you want further help, I suggest you seek out a forum where that is appropriate (not a defect tracker).
I need to see what data has been sent/received on the connection, which you have previously said you cannot give me.
99% of the times I use .get() i get responses in ms... but there are certain URLs that I work on, both http and https, they are available globally too(but can't be shared for some reason). What I found is that some of these .get() never actually returns a value instead .get() command never ends, it keeps running infinitely. What could be the reason?Is there any alternate, such as using Proxy or anything as such? Please suggest.