Closed 5j9 closed 2 years ago
Hi @5j9, Requests uses urllib3 under the hood so this issue appears specific to how the service is handling calls from the Requests user-agent. If you look through closed issues you'll find it's very common practice for web servers to restrict access via Requests due to abusive scraper behavior. This isn't something we provide support for but is widely answered on platforms such as StackOverflow.
Hi @nateprewitt ,
I don't belive that user-agent is the key here. I retested my urllib3 script above with an additional headers={'User-Agent': 'python-requests/2.27.1'}
parameter. It was still able to communicate properly.
import urllib3
from time import sleep
print('urllib3')
http = urllib3.PoolManager()
url = 'http://tsetmc.com/Loader.aspx?ParTree=15'
resp = http.request('GET', url, headers={'User-Agent': 'python-requests/2.27.1'})
print(resp.status)
sleep(200)
resp = http.request('GET', url, headers={'User-Agent': 'python-requests/2.27.1'})
print(resp.status)
# will print
# 200
# 200
Also, it does not seem to be a case of restricting access to requests, if it was so, why would the first request succeed and only the second request fail with a timeout? It does not make sense to me, if the server wanted to block requests
it could have done so on the initial attempt.
I might be wrong, but I think I've found the culprit: https://github.com/psf/requests/blob/95f456733656ed93645ff0250bfa54f6d256f6fe/requests/adapters.py#L117
As you can see, requests has set DEFAULT_RETRIES
to 0
. I guess all other libraries retry when facing a failed connections from the connection pool:
https://github.com/urllib3/urllib3/blob/f0dffb4e2437cb2da2ba0a6bbea6211f6fd0fa4b/src/urllib3/util/retry.py#L526 https://github.com/encode/httpcore/blob/54567ac1df3761c14f50f2cf55769921f60cd8b3/httpcore/_sync/connection_pool.py#L238
Mounting an HTTPAdapter
with a retry value other than 0 fixed the issue for me. All I had to do was:
from requests.adapters import HTTPAdapter, Retry
session = Session()
retries = Retry(total=1)
session.mount('http://', HTTPAdapter(max_retries=retries))
...
In HTTP 1.1, all connections are considered persistent unless declared otherwise. However, many HTTP servers use a timeout for connections.1 Since a client has no way of knowing a connection has been dropped by the server in such cases, it sounds only logical to me for client to retry any apparently closed connection from the connection pool instead of raising an error. Thus, I think requests should change the default retry value of 0 for DEFAULT_RETRIES or implement some other way to retry on closed connections.
Consider the following script:
the above script fails with:
I believe there is some issue with how requests retries connections from the connection pool.
Apparently similar script works fine when using other libraries. I've tried the following:
Expected Result
requests
should be able to handle the underlying situation like other libraries.System Information