Closed purarue closed 3 years ago
I can't reproduce this
>>> import requests
>>> token = 'ghp_mytotallyrealtokenwithonlyreposcope'
>>> headers = {'Authorization': f'token {token}', "User-Agent": "testing-requests-5801", "Accept": "application/vnd.github.v3+json"}
>>> url = "https://api.github.com/repos/sigmavirus24/github3.py/traffic/clones"
>>> requests.get(url, headers=headers)
<Response [200]>
>>> r = _
>>> r.request
<PreparedRequest [GET]>
>>> r.request.headers
{'User-Agent': 'testing-requests-5801', 'Accept-Encoding': 'gzip, deflate', 'Accept': 'application/vnd.github.v3+json', 'Connection': 'keep-alive', 'Authorization': 'token ghp_mytotallyrealtokenwithonlyreposcope'}
>>> r.json()
{'count': 1656, 'uniques': 1532, 'clones': [{'timestamp': '2021-04-12T00:00:00Z', 'count': 174, 'uniques': 168}, {'timestamp': '2021-04-13T00:00:00Z', 'count': 248, 'uniques': 228}, {'timestamp': '2021-04-14T00:00:00Z', 'count': 209, 'uniques': 206}, {'timestamp': '2021-04-15T00:00:00Z', 'count': 153, 'uniques': 146}, {'timestamp': '2021-04-16T00:00:00Z', 'count': 137, 'uniques': 131}, {'timestamp': '2021-04-17T00:00:00Z', 'count': 13, 'uniques': 13}, {'timestamp': '2021-04-18T00:00:00Z', 'count': 9, 'uniques': 9}, {'timestamp': '2021-04-19T00:00:00Z', 'count': 96, 'uniques': 92}, {'timestamp': '2021-04-20T00:00:00Z', 'count': 172, 'uniques': 139}, {'timestamp': '2021-04-21T00:00:00Z', 'count': 129, 'uniques': 124}, {'timestamp': '2021-04-22T00:00:00Z', 'count': 137, 'uniques': 132}, {'timestamp': '2021-04-23T00:00:00Z', 'count': 154, 'uniques': 148}, {'timestamp': '2021-04-24T00:00:00Z', 'count': 11, 'uniques': 11}, {'timestamp': '2021-04-25T00:00:00Z', 'count': 14, 'uniques': 14}]}
>>>
Yeah, I was expecting that might be the problem with reporting the issue, is there anything else I can do to diagnose this further?
You could look at what's present in your response.request
object. I printed the headers as there shouldn't be anything else differing. You may also look into whether you have any proxies or other intermediaries that Requests might be detecting and using for your traffic - which httpx might not be using.
I don't want to hijack this issue, and will happily file my own if my issue is unrelated, but I'm having a problem with making an OPTIONS
request. If I run the following code in python 2.7
with requests 2.7.0
it works fine and I get a 204
back. If I run it with python 3.8.5
I get a 403
:
import requests
from requests_toolbelt.utils import dump
def print_raw_http(response):
data = dump.dump_all(response, request_prefix=b'', response_prefix=b'')
print('\n' * 2 + data.decode('utf-8'))
headers = {
'User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 11_1_2 like Mac OS X) AppleWebKit/604.3.5 (KHTML, like Gecko) Mobile/15B202 NETGEAR/v1 (iOS Vuezone)',
}
session = requests.Session()
r = session.options('https://ocapi-app.arlo.com/api/auth', headers=headers)
print_raw_http(r)
r.raise_for_status()
$ python -V
Python 2.7.16
$ python3 -V
Python 3.8.5
$ pip freeze | grep requests
requests==2.7.0
$ pip3 freeze | grep requests
requests==2.7.0
python 2.7
:
$ python blah.py
OPTIONS /api/auth HTTP/1.1
Host: ocapi-app.arlo.com
Connection: keep-alive
Accept-Encoding: gzip, deflate
Accept: */*
User-Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 11_1_2 like Mac OS X) AppleWebKit/604.3.5 (KHTML, like Gecko) Mobile/15B202 NETGEAR/v1 (iOS Vuezone)
Content-Length: 0
HTTP/1.1 204 No Content
python 3.8.5
:
$ python3 blah.py
OPTIONS /api/auth HTTP/1.1
Host: ocapi-app.arlo.com
User-Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 11_1_2 like Mac OS X) AppleWebKit/604.3.5 (KHTML, like Gecko) Mobile/15B202 NETGEAR/v1 (iOS Vuezone)
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive
Content-Length: 0
HTTP/1.1 403 Forbidden
Works fine with cURL too:
curl -vvvv -X OPTIONS "https://ocapi-app.arlo.com/api/auth" --output --http1.1 --no-alpn --no-npn -H "Host: ocapi-app.arlo.com" -H "Connection: keep-alive" -H "Accept-Encoding: gzip, deflate" -H "Accept: */*" -H "User-Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 11_1_2 like Mac OS X) AppleWebKit/604.3.5 (KHTML, like Gecko) Mobile/15B202 NETGEAR/v1 (iOS Vuezone)" -H "Content-length: 0"
> OPTIONS /api/auth HTTP/1.1
> Host: ocapi-app.arlo.com
> Connection: keep-alive
> Accept-Encoding: gzip, deflate
> Accept: */*
> User-Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 11_1_2 like Mac OS X) AppleWebKit/604.3.5 (KHTML, like Gecko) Mobile/15B202 NETGEAR/v1 (iOS Vuezone)
> Content-length: 0
>
< HTTP/1.1 204 No Content
Will see if I can try and find similarities/switch to python2.7 and see if that makes any difference when I debug this further, probably this weekend
Any luck?
Cant tell if an immediate switch to python2.7 does anything -- cant test with httpx since that requires 3.6. But running on 2.7 I get the same error as on 3.9.4
Modified the script to work on both versions:
#!/usr/bin/env python3
from __future__ import print_function
import sys
import requests
if sys.version_info.major == 3:
import httpx
def extract_status(obj):
if hasattr(obj, "status"):
return obj.status
if hasattr(obj, "status_code"):
return obj.status_code
raise TypeError("unsupported request object")
def make_request(using_verb, url, headers):
resp = using_verb(url, headers=headers)
status = extract_status(resp)
print(str(resp.json()))
if status == 200:
print("succeeded")
else:
print("failed")
resp.raise_for_status()
def main():
# see https://github.com/seanbreckenridge/pygithub_requests_error for token scopes
auth_token = "token here"
headers = {
"Authorization": "token {}".format(auth_token),
"User-Agent": "requests_error",
"Accept": "application/vnd.github.v3+json",
}
# replace this with a URL you have access to
url = "https://api.github.com/repos/seanbreckenridge/albums/traffic/clones"
if sys.version_info.major == 3:
make_request(httpx.get, url, headers)
make_request(requests.get, url, headers)
if __name__ == "__main__":
main()
On 2.7, installed using the AUR requests package:
python2.7 -m requests.help
{
"chardet": {
"version": "4.0.0"
},
"cryptography": {
"version": ""
},
"idna": {
"version": "2.10"
},
"implementation": {
"name": "CPython",
"version": "2.7.18"
},
"platform": {
"release": "5.12.1-arch1-1",
"system": "Linux"
},
"pyOpenSSL": {
"openssl_version": "",
"version": null
},
"requests": {
"version": "2.25.1"
},
"system_ssl": {
"version": "101010bf"
},
"urllib3": {
"version": "1.26.4"
},
"using_pyopenssl": false
}
$ python2.7 main.py
{u'documentation_url': u'https://docs.github.com/rest/reference/repos#get-repository-clones', u'message': u'Must have push access to repository'}
failed
Traceback (most recent call last):
File "main.py", line 56, in <module>
main()
File "main.py", line 52, in main
make_request(requests.get, url, headers)
File "main.py", line 31, in make_request
resp.raise_for_status()
File "/usr/lib/python2.7/site-packages/requests/models.py", line 943, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://api.github.com/repos/seanbreckenridge/albums/traffic/clones
On 3.9:
$ python3 main.py
{'count': 2, 'uniques': 2, 'clones': [{'timestamp': '2021-05-06T00:00:00Z', 'count': 1, 'uniques': 1}, {'timestamp': '2021-05-07T00:00:00Z', 'count': 1, 'uniques': 1}]}
succeeded
{'message': 'Must have push access to repository', 'documentation_url': 'https://docs.github.com/rest/reference/repos#get-repository-clones'}
failed
Traceback (most recent call last):
File "/home/sean/Repos/requests_test/main.py", line 56, in <module>
main()
File "/home/sean/Repos/requests_test/main.py", line 52, in main
make_request(requests.get, url, headers)
File "/home/sean/Repos/requests_test/main.py", line 31, in make_request
resp.raise_for_status()
File "/usr/lib/python3.9/site-packages/requests/models.py", line 943, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://api.github.com/repos/seanbreckenridge/albums/traffic/clones
So I'd guess your error is something different?
Will try and inspect the request info to see if theres anything different there...
@seanbreckenridge try downgrading your urllib to 1.24. I was able to work around my 403 issue using Python 3.x that way.
[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"
[packages]
httpx = "*"
requests = "*"
urllib3 = "==1.24"
[dev-packages]
[requires]
python_version = "3.9"
httpx==0.18.1
- certifi [required: Any, installed: 2021.5.30]
- httpcore [required: >=0.13.0,<0.14.0, installed: 0.13.3]
- h11 [required: >=0.11,<0.13, installed: 0.12.0]
- sniffio [required: ==1.*, installed: 1.2.0]
- rfc3986 [required: >=1.3,<2, installed: 1.5.0]
- sniffio [required: Any, installed: 1.2.0]
requests==2.25.1
- certifi [required: >=2017.4.17, installed: 2021.5.30]
- chardet [required: >=3.0.2,<5, installed: 4.0.0]
- idna [required: >=2.5,<3, installed: 2.10]
- urllib3 [required: >=1.21.1,<1.27, installed: 1.24]
Created a pipenv with 1.24; ran the same script above, doesn't seem to fix my issue. So the issues we had were probably separate
$ pipenv run python3 main.py
{'count': 12, 'uniques': 12, 'clones': [{'timestamp': '2021-05-21T00:00:00Z', 'count': 7, 'uniques': 7}, {'timestamp': '2021-05-23T00:00:00Z', 'count': 1, 'uniques': 1}, {'timestamp': '2021-05-30T00:00:00Z', 'count': 1, 'uniques': 1}, {'timestamp': '2021-06-01T00:00:00Z', 'count': 1, 'uniques': 1}, {'timestamp': '2021-06-02T00:00:00Z', 'count': 2, 'uniques': 2}]}
succeeded
{'message': 'Must have push access to repository', 'documentation_url': 'https://docs.github.com/rest/reference/repos#get-repository-clones'}
failed
Traceback (most recent call last):
File "/home/sean/Repos/requests_test/main.py", line 56, in <module>
main()
File "/home/sean/Repos/requests_test/main.py", line 52, in main
make_request(requests.get, url, headers)
File "/home/sean/Repos/requests_test/main.py", line 31, in make_request
resp.raise_for_status()
File "/home/sean/.local/share/virtualenvs/requests_test-YPmFob8P-python/lib/python3.9/site-packages/requests/models.py", line 943, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://api.github.com/repos/seanbreckenridge/albums/traffic/clones
I had the same issue in the gpodder app which uses requests, in this case a podcast CDN was issuing 403 for some HTTP requests, which are normally CDN redirects. I am not sure why, but as a workaround I found that using the underlying PreparedRequest object and session send() it works.
I'm on version: 2.24.0-lp152.3.3.1
To reproduce (at least for a week or two while this podcast URL is valid):
Python 3.6.12 (default, Dec 02 2020, 09:44:23) [GCC] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>> url = 'https://open.live.bbc.co.uk/mediaselector/6/redir/version/2.0/mediaset/audio-nondrm-download-low/proto/http/vpid/p09s18sm.mp3'
>>> s = requests.Session()
>>> r = s.get(url)
>>> r
<Response [403]>
>>> r.request.headers
{'User-Agent': 'python-requests/2.24.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'Authorization': 'Basic YW5vbnltb3VzOmRlYXZlbkBkZWF2ZW4ubmV0'}
>>> req = requests.Request('GET', url)
>>> prep_req = req.prepare()
>>> resp = s.send(prep_req)
>>> resp
<Response [200]>
>>> resp.request.headers
{'Authorization': 'Basic YW5vbnltb3VzOmRlYXZlbkBkZWF2ZW4ubmV0'}
>>> s.headers
{'User-Agent': 'python-requests/2.24.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
Python 3.6.12 (default, Dec 24 2020, 11:04:11)
[GCC 10.2.1 20201125 (Red Hat 10.2.1-9)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>> url = 'https://open.live.bbc.co.uk/mediaselector/6/redir/version/2.0/mediaset/audio-nondrm-download-low/proto/http/vpid/p09s18sm.mp3'
>>> s = requests.Session()
>>> r = s.get(url)
>>> r
<Response [200]>
>>> r.request.headers
{'User-Agent': 'python-requests/2.26.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
>>> s.headers
{'User-Agent': 'python-requests/2.26.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
>>> req = requests.Request('GET', url)
>>> prep_req = req.prepare()
>>> resp = s.send(prep_req)
>>> resp
<Response [200]>
>>> resp.request.headers
{}
>>> s.headers
{'User-Agent': 'python-requests/2.26.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
>>>
Once again, this is not reproducible for me.
@davedeaven you did make me wonder though, what happens if you do s.trust_env = False
first? Also do you have a .netrc
that's supplying your authorization headers here or did you leave out code? Finally, when you see the 403, what does r.history
show you?
@sigmavirus24, I tried the s.trust_env, it works.
>>> import requests
>>> url = 'https://open.live.bbc.co.uk/mediaselector/6/redir/version/2.0/mediaset/audio-nondrm-download-low/proto/http/vpid/p09s18sm.mp3'
>>> s = requests.Session()
>>> s.trust_env = False
>>> r = s.get(url)
>>> r
<Response [200]>
>>> r.history
[<Response [302]>, <Response [302]>]
>>> s.trust_env = True
>>> r = s.get(url)
>>> r
<Response [403]>
>>> r.history
[]
And you are correct, I do have a ~/.netrc of the form
default login anonymous password deaven@deaven.net
If I remove this, then the request is successful even without the s.trust_env setting. So that is the cause... this solves my issue because I do not need the ~/.netrc file, it was left over from an earlier era. Although it does seem to be somewhat unexpected behavior that the prepared request works differently, and I will note that in the same environment, tools like wget (and browsers) work fine to retrieve this URL
Really appreciate the fast response on this, thank you!
Seems to be the same issue I had - I also had a .netrc
file which had information to login to github like
machine api.github.com
login seanbreckenridge
password something_here
Removing that file or setting the trust_env
fixes my issue
Yeah, I'll be completely transparent, I hate that we use as much of the environment as we do (.netrc
, REQUESTS_CA_BUNDLE
, CURL_CA_BUNDLE
, etc.)
Summary
TL;DR:
requests
raises a 403 while requesting an authenticated Github API route, which otherwise succeeds while usingcurl
/another python library likehttpx
Was initially discovered in the 'ghexport' project; I did a reasonable amount of debugging and created this repo before submitting this issue to PyGithub, but thats a lot to look through, just leaving it here as context.
It's been hard to reproduce, the creator of ghexport (where this was initially discovered) didn't have the same issue, so I'm unsure of the exact reason
Expected Result
requests
succeeds for the authenticated requestActual Result
Request fails, with:
Reproduction Steps
Apologies if this is a bit too specific, but otherwise
requests
works great on my system and I can't find any other way to reproduce this -- Is a bit long as it requires an auth tokenGo here and create a token with scopes like:
I've compared this to httpx, where it doesn't fail:
That outputs:
Another thing that may be useful as context is the
pdb
trace I did here, which was me stepping into where the request was made inPyGithub
, and making all the requests manually using the computedurl
/headers
. Fails when I userequests.get
buthttpx.get
works fine:System Information