psf / requests

A simple, yet elegant, HTTP library.
https://requests.readthedocs.io/en/latest/
Apache License 2.0
52.14k stars 9.33k forks source link

requests get(URL) never returns, even with Proxy! #3748

Closed arunchandramouli closed 7 years ago

arunchandramouli commented 7 years ago

99% of the times I use .get() i get responses in ms... but there are certain URLs that I work on, both http and https, they are available globally too(but can't be shared for some reason). What I found is that some of these .get() never actually returns a value instead .get() command never ends, it keeps running infinitely. What could be the reason?Is there any alternate, such as using Proxy or anything as such? Please suggest.

sigmavirus24 commented 7 years ago

@arunchandramouli have you considered setting a timeout?

Beyond that, I'd advise you to not ask questions on a defect (issue) tracker. Instead, ask questions on StackOverflow.

arunchandramouli commented 7 years ago

Yes I did timeout and it returns as 403/404 but not 200. I did see through SO, but didnt get similar query. I would like to know why a few sites don't return at all, but request.get() keep running infinitely

Lukasa commented 7 years ago

@arunchandramouli There are two possibilities. Either 1) the network has hung, in which case a timeout will help because it'll throw an exception; or 2) the response is infinite in size. If the response is infinite in size and you don't use the stream=True parameter, Requests will attempt to read the entire response body. That obviously won't work. Eventually, such a use of Requests will cause you to run out of memory.

arunchandramouli commented 7 years ago

@Lukasa - Great ! but when I did telnet proxy followed by get urlname, it actually returns me the content.it works that way. But I tried using requests.get(), urllib.urlopen() and httplib, it all fails, sounds a bit interesting.

Lukasa commented 7 years ago

@arunchandramouli That sounds very much like the proxy is misconfigured. Can you show me the complete response you got from the proxy, including the headers? The telnet request you sent would also be good.

arunchandramouli commented 7 years ago

@Lukasa The Proxy / Normal request doesn't request anything at all , it keeps hanging unless I issue a timeout. Where as Telnet GET returns the url content. I couldn't share due to privacy and client protocol issues.

Lukasa commented 7 years ago

@arunchandramouli If you can't share the data then I'm afraid we're essentially unable to help you. The best I can tell you is that the response is almost certainly ill-formed, which is why none of the Python HTTP libraries you've tried can parse it.

arunchandramouli commented 7 years ago

I got it, that might the case. But I could see the responses in FF and Chrome.

Lukasa commented 7 years ago

That means very little. Servers and proxies can mutate responses based on various properties of the request, and additionally browsers have very lenient parsers.

But again, we cannot help you if we cannot get more information about the response. Listing everything that can parse the response is not helpful, because we cannot turn that into a matrix of things we would need to change. Either we need to be able to reproduce this, or you're going to need to investigate without our assistance.

arunchandramouli commented 7 years ago

will try to share.. But there is no response to share at all, all i gt is a timeout message, if I had given timeout else it keeps hanging.....

Lukasa commented 7 years ago

@arunchandramouli Can you provide a stacktrace for your timeout?

arunchandramouli commented 7 years ago

@Lukasa requests.exceptions.ReadTimeOut:HTTPSConnectionPool(host='***',port=443).Read Timed out.

api.py -> sessions.py -> adapters.py (normal flow for HTTP/HTTPS connection) I couldn't copy the trace from the server

Lukasa commented 7 years ago

I need the complete traceback. I need to know where we are. If I can't get function names and line numbers then I can't help you.

arunchandramouli commented 7 years ago

let me tell you the flow; api.py(line 68) --> api.py(line 50) -->sessions.py(line 464) --> sessions.py(line 576) --> adpaters.py(line 433)

there in function send, I get raise readTimeout

Lukasa commented 7 years ago

What version of Requests are you using?

arunchandramouli commented 7 years ago

2.6.0

Lukasa commented 7 years ago

Unfortunately, that traceback on that version is just not informative enough. Have you tried upgrading your Requests version?

(The main problem here is that the traceback just points to where we threw our wrapper exception, and so without more detail (e.g. the complete text of the exception message) I can't work out where the inner exception came from.)

arunchandramouli commented 7 years ago

I cant upgrade.. client permission restrictions.. but we use Anaconda to run d code too.. I tried via Interpreter and Anaconda, still fails at same point....

refer https://github.com/kennethreitz/requests/blob/master/requests/adapters.py line 498 , class HTTPAdapted, function send

elif isinstance(e, ReadTimeoutError): raise ReadTimeout(e, request=request)

Lukasa commented 7 years ago

Yeah, so as discussed, without being able to see the full text of the exception I can't be more helpful. It is a wrapper exception, so it contains an exception within it that may be a bit clearer.

arunchandramouli commented 7 years ago

Something very similar to this, I get exception as same, but Readtimeout

Traceback (most recent call last): File "", line 1, in File "C:\Python27\lib\site-packages\requests\api.py", line 70, in get return request('get', url, params=params, kwargs) File "C:\Python27\lib\site-packages\requests\api.py", line 56, in request return session.request(method=method, url=url, kwargs) File "C:\Python27\lib\site-packages\requests\sessions.py", line 488, in request resp = self.send(prep, send_kwargs) File "C:\Python27\lib\site-packages\requests\sessions.py", line 609, in send r = adapter.send(request, kwargs) File "C:\Python27\lib\site-packages\requests\adapters.py", line 487, in send raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: HTTPConnectionPool(host='www.somebadurl.com', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<requests.packages.urll onnection object at 0x0000000003E2E1D0>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

Lukasa commented 7 years ago

@arunchandramouli I care mostly about the bit at the end, where we have all this:

requests.exceptions.ConnectionError: HTTPConnectionPool(host='www.somebadurl.com', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<requests.packages.urll onnection object at 0x0000000003E2E1D0>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

I need to know what you see in that section.

arunchandramouli commented 7 years ago

In the sample case that i tried, I had shared you the full ST. But in realtime client server;

I get requests.exceptions.ReadTimeout:HTTPSConnectionPool(host="*",port=443):Read timed out. (read time out = 10)

So, we can conclude that in real time, I dont get connection error but timeout(if I had specified it as a Param) else it continues to poll the request.get() infinitely

Lukasa commented 7 years ago

Yup, so the question is going to be: where is that read timeout coming from? It can come in two places. One real possibility is that the response is simply not arriving on time. This can happen, particularly with proxies involved. In that case, the simplest thing to do is to just try again.

arunchandramouli commented 7 years ago

Proxies didnt help either.. I tried a simple requests.get(URL,timeout=10) .. now i am trying with timeout as 500 secs.. but requests module in real time is so good that it returns in ms..

arunchandramouli commented 7 years ago

timeout of 500 also failed

arunchandramouli commented 7 years ago

tried using urllib and urllib2... they hang at this call; req = urllib2.Request('*') - creates an urllib2.Request instance but fails on calls made to addinfourl here; response = urllib2.urlopen(req)

Refer ; https://github.com/python-git/python/blob/master/Lib/urllib.py ; line 965;

class addinfourl(addbase): """class to add info() and geturl() methods to an open file."""

def __init__(self, fp, headers, url, code=None):
    addbase.__init__(self, fp)
    self.headers = headers
    self.url = url
    self.code = code

def info(self):
    return self.headers

def getcode(self):
    return self.code

def geturl(self):

return self.url

Lukasa commented 7 years ago

Unfortunately, none of this detail really helps to narrow anything down. We need to see what's coming down the wire.

arunchandramouli commented 7 years ago

What more info you want? Can you please specify?

Lukasa commented 7 years ago

I need to see where the read timeout is occurring: whether it's occurring in the read of the body or of the headers. Can you set stream=True and tell me if you get the timeout in the call to requests.get or when you access the body content?

arunchandramouli commented 7 years ago

when I set stream=True, timeout=20. It still gives me ReadTimeOut

.get() fails yet

Lukasa commented 7 years ago

So if get fails, that means that the response headers are not being completely received. Either no data is being received, or the data is malformed and Requests is still expecting more data. Either way, without being able to see any better into the data, your best option is simply to retry.

arunchandramouli commented 7 years ago

requests.head(URL) also runs infinitely

arunchandramouli commented 7 years ago

Response and Request headers + contents is viewed good @ firefox and chrome

Lukasa commented 7 years ago

As I've mentioned before, the fact that Firefox and Chrome can see it is not helpful. Either the content is malformed, or it is not appearing, and the server may well be serving different data to Requests than to FF or Chrome.

arunchandramouli commented 7 years ago

yes True, but from my side what more info you need?Please suggest

sigmavirus24 commented 7 years ago

This conversation is now going in circles. To prevent further noise (and little signal) being delivered to roughly 1000 people, I'm going to lock this conversation. @arunchandramouli we've given you sufficient amounts of detail that we require and you consistently ignore it. If you want further help, I suggest you seek out a forum where that is appropriate (not a defect tracker).

Lukasa commented 7 years ago

I need to see what data has been sent/received on the connection, which you have previously said you cannot give me.