psf / requests

A simple, yet elegant, HTTP library.
https://requests.readthedocs.io/en/latest/
Apache License 2.0
52.15k stars 9.33k forks source link

403 error while using github API #5801

Closed purarue closed 3 years ago

purarue commented 3 years ago

Summary

TL;DR: requests raises a 403 while requesting an authenticated Github API route, which otherwise succeeds while using curl/another python library like httpx

Was initially discovered in the 'ghexport' project; I did a reasonable amount of debugging and created this repo before submitting this issue to PyGithub, but thats a lot to look through, just leaving it here as context.

It's been hard to reproduce, the creator of ghexport (where this was initially discovered) didn't have the same issue, so I'm unsure of the exact reason

Expected Result

requests succeeds for the authenticated request

Actual Result

Request fails, with:

{'message': 'Must have push access to repository', 'documentation_url': 'https://docs.github.com/rest/reference/repos#get-repository-clones'}
failed
Traceback (most recent call last):
  File "/home/sean/Repos/pygithub_requests_error/minimal.py", line 47, in <module>
    main()
  File "/home/sean/Repos/pygithub_requests_error/minimal.py", line 44, in main
    make_request(requests.get, url, headers)
  File "/home/sean/Repos/pygithub_requests_error/minimal.py", line 31, in make_request
    resp.raise_for_status()
  File "/home/sean/.local/lib/python3.9/site-packages/requests/models.py", line 941, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://api.github.com/repos/seanbreckenridge/albums/traffic/clones

Reproduction Steps

Apologies if this is a bit too specific, but otherwise requests works great on my system and I can't find any other way to reproduce this -- Is a bit long as it requires an auth token

Go here and create a token with scopes like:

I've compared this to httpx, where it doesn't fail:

#!/usr/bin/env python3

from typing import Callable, Any

import requests
import httpx

# extract status/status_code from the requests/httpx item
def extract_status(obj: Any) -> int:
    if hasattr(obj, "status"):
        return obj.status
    if hasattr(obj, "status_code"):
        return obj.status_code
    raise TypeError("unsupported request object")

def make_request(using_verb: Callable[..., Any], url: str, headers: Any) -> None:

    print("using", using_verb.__module__, using_verb.__qualname__, url)

    resp = using_verb(url, headers=headers)
    status = extract_status(resp)

    print(str(resp.json()))

    if status == 200:
        print("succeeded")
    else:
        print("failed")
        resp.raise_for_status()

def main():
    # see https://github.com/seanbreckenridge/pygithub_requests_error for token scopes
    auth_token = "put your auth token here"

    headers = {
        "Authorization": "token {}".format(auth_token),
        "User-Agent": "requests_error",
        "Accept": "application/vnd.github.v3+json",
    }

    # replace this with a URL you have access to
    url = "https://api.github.com/repos/seanbreckenridge/albums/traffic/clones"

    make_request(httpx.get, url, headers)
    make_request(requests.get, url, headers)

if __name__ == "__main__":
    main()

That outputs:

using httpx get https://api.github.com/repos/seanbreckenridge/albums/traffic/clones
{'count': 15, 'uniques': 10, 'clones': [{'timestamp': '2021-04-12T00:00:00Z', 'count': 1, 'uniques': 1}, {'timestamp': '2021-04-14T00:00:00Z', 'count': 1, 'uniques': 1}, {'timestamp': '2021-04-17T00:00:00Z', 'count': 1, 'uniques': 1}, {'timestamp': '2021-04-18T00:00:00Z', 'count': 2, 'uniques': 2}, {'timestamp': '2021-04-23T00:00:00Z', 'count': 9, 'uniques': 5}, {'timestamp': '2021-04-25T00:00:00Z', 'count': 1, 'uniques': 1}]}
succeeded
using requests.api get https://api.github.com/repos/seanbreckenridge/albums/traffic/clones
{'message': 'Must have push access to repository', 'documentation_url': 'https://docs.github.com/rest/reference/repos#get-repository-clones'}
failed
Traceback (most recent call last):
  File "/home/sean/Repos/pygithub_requests_error/minimal.py", line 48, in <module>
    main()
  File "/home/sean/Repos/pygithub_requests_error/minimal.py", line 45, in main
    make_request(requests.get, url, headers)
  File "/home/sean/Repos/pygithub_requests_error/minimal.py", line 32, in make_request
    resp.raise_for_status()
  File "/home/sean/.local/lib/python3.9/site-packages/requests/models.py", line 943, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://api.github.com/repos/seanbreckenridge/albums/traffic/clones

Another thing that may be useful as context is the pdb trace I did here, which was me stepping into where the request was made in PyGithub, and making all the requests manually using the computed url/headers. Fails when I use requests.get but httpx.get works fine:

> /home/sean/.local/lib/python3.8/site-packages/github/Requester.py(484)__requestEncode()
-> self.NEW_DEBUG_FRAME(requestHeaders)
(Pdb) n
> /home/sean/.local/lib/python3.8/site-packages/github/Requester.py(486)__requestEncode()
-> status, responseHeaders, output = self.__requestRaw(
(Pdb) w
  /home/sean/Repos/ghexport/export.py(109)<module>()
-> main()
  /home/sean/Repos/ghexport/export.py(84)main()
-> j = get_json(**params)
  /home/sean/Repos/ghexport/export.py(74)get_json()
-> return Exporter(**params).export_json()
  /home/sean/Repos/ghexport/export.py(60)export_json()
-> repo._requester.requestJsonAndCheck('GET', repo.url + '/traffic/' + f)
  /home/sean/.local/lib/python3.8/site-packages/github/Requester.py(318)requestJsonAndCheck()
-> *self.requestJson(
  /home/sean/.local/lib/python3.8/site-packages/github/Requester.py(410)requestJson()
-> return self.__requestEncode(cnx, verb, url, parameters, headers, input, encode)
> /home/sean/.local/lib/python3.8/site-packages/github/Requester.py(486)__requestEncode()
-> status, responseHeaders, output = self.__requestRaw(
(Pdb) url
'/repos/seanbreckenridge/advent-of-code-2019/traffic/views'
(Pdb) requestHeaders
{'Authorization': 'token <MY TOKEN HERE>', 'User-Agent': 'PyGithub/Python'}
(Pdb) import requests
(Pdb) requests.get("https://api.github.com" + url, headers=requestHeaders).json()
{'message': 'Must have push access to repository', 'documentation_url': 'https://docs.github.com/rest/reference/repos#get-page-views'}
(Pdb) httpx.get("https://api.github.com" + url, headers=requestHeaders).json()
{'count': 0, 'uniques': 0, 'views': []}
(Pdb) httpx succeeded??

System Information

$ python -m requests.help
{
  "chardet": {
    "version": "3.0.4"
  },
  "cryptography": {
    "version": "3.4.7"
  },
  "idna": {
    "version": "2.10"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.9.3"
  },
  "platform": {
    "release": "5.11.16-arch1-1",
    "system": "Linux"
  },
  "pyOpenSSL": {
    "openssl_version": "101010bf",
    "version": "20.0.1"
  },
  "requests": {
    "version": "2.25.1"
  },
  "system_ssl": {
    "version": "101010bf"
  },
  "urllib3": {
    "version": "1.25.9"
  },
  "using_pyopenssl": true
}
$ pip freeze | grep -E 'requests|httpx'
httpx==0.16.1
requests==2.25.1
sigmavirus24 commented 3 years ago

I can't reproduce this

>>> import requests
>>> token = 'ghp_mytotallyrealtokenwithonlyreposcope'
>>> headers = {'Authorization': f'token {token}', "User-Agent": "testing-requests-5801", "Accept": "application/vnd.github.v3+json"}
>>> url = "https://api.github.com/repos/sigmavirus24/github3.py/traffic/clones"
>>> requests.get(url, headers=headers)
<Response [200]>
>>> r = _
>>> r.request
<PreparedRequest [GET]>
>>> r.request.headers
{'User-Agent': 'testing-requests-5801', 'Accept-Encoding': 'gzip, deflate', 'Accept': 'application/vnd.github.v3+json', 'Connection': 'keep-alive', 'Authorization': 'token ghp_mytotallyrealtokenwithonlyreposcope'}
>>> r.json()
{'count': 1656, 'uniques': 1532, 'clones': [{'timestamp': '2021-04-12T00:00:00Z', 'count': 174, 'uniques': 168}, {'timestamp': '2021-04-13T00:00:00Z', 'count': 248, 'uniques': 228}, {'timestamp': '2021-04-14T00:00:00Z', 'count': 209, 'uniques': 206}, {'timestamp': '2021-04-15T00:00:00Z', 'count': 153, 'uniques': 146}, {'timestamp': '2021-04-16T00:00:00Z', 'count': 137, 'uniques': 131}, {'timestamp': '2021-04-17T00:00:00Z', 'count': 13, 'uniques': 13}, {'timestamp': '2021-04-18T00:00:00Z', 'count': 9, 'uniques': 9}, {'timestamp': '2021-04-19T00:00:00Z', 'count': 96, 'uniques': 92}, {'timestamp': '2021-04-20T00:00:00Z', 'count': 172, 'uniques': 139}, {'timestamp': '2021-04-21T00:00:00Z', 'count': 129, 'uniques': 124}, {'timestamp': '2021-04-22T00:00:00Z', 'count': 137, 'uniques': 132}, {'timestamp': '2021-04-23T00:00:00Z', 'count': 154, 'uniques': 148}, {'timestamp': '2021-04-24T00:00:00Z', 'count': 11, 'uniques': 11}, {'timestamp': '2021-04-25T00:00:00Z', 'count': 14, 'uniques': 14}]}
>>>
purarue commented 3 years ago

Yeah, I was expecting that might be the problem with reporting the issue, is there anything else I can do to diagnose this further?

sigmavirus24 commented 3 years ago

You could look at what's present in your response.request object. I printed the headers as there shouldn't be anything else differing. You may also look into whether you have any proxies or other intermediaries that Requests might be detecting and using for your traffic - which httpx might not be using.

jeffreydwalter commented 3 years ago

I don't want to hijack this issue, and will happily file my own if my issue is unrelated, but I'm having a problem with making an OPTIONS request. If I run the following code in python 2.7 with requests 2.7.0 it works fine and I get a 204 back. If I run it with python 3.8.5 I get a 403:

import requests
from requests_toolbelt.utils import dump

def print_raw_http(response):
    data = dump.dump_all(response, request_prefix=b'', response_prefix=b'')
    print('\n' * 2 + data.decode('utf-8'))

headers = {
    'User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 11_1_2 like Mac OS X) AppleWebKit/604.3.5 (KHTML, like Gecko) Mobile/15B202 NETGEAR/v1 (iOS Vuezone)',
}
session = requests.Session()
r = session.options('https://ocapi-app.arlo.com/api/auth', headers=headers)
print_raw_http(r)
r.raise_for_status()
$ python -V
Python 2.7.16

$ python3 -V
Python 3.8.5

$ pip freeze | grep requests
requests==2.7.0

$ pip3 freeze | grep requests
requests==2.7.0

python 2.7:

$ python blah.py 

OPTIONS /api/auth HTTP/1.1
Host: ocapi-app.arlo.com
Connection: keep-alive
Accept-Encoding: gzip, deflate
Accept: */*
User-Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 11_1_2 like Mac OS X) AppleWebKit/604.3.5 (KHTML, like Gecko) Mobile/15B202 NETGEAR/v1 (iOS Vuezone)
Content-Length: 0

HTTP/1.1 204 No Content

python 3.8.5:

$ python3 blah.py 

OPTIONS /api/auth HTTP/1.1
Host: ocapi-app.arlo.com
User-Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 11_1_2 like Mac OS X) AppleWebKit/604.3.5 (KHTML, like Gecko) Mobile/15B202 NETGEAR/v1 (iOS Vuezone)
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive
Content-Length: 0

HTTP/1.1 403 Forbidden

Works fine with cURL too:

curl -vvvv -X OPTIONS "https://ocapi-app.arlo.com/api/auth" --output --http1.1 --no-alpn --no-npn -H "Host: ocapi-app.arlo.com" -H "Connection: keep-alive" -H "Accept-Encoding: gzip, deflate" -H "Accept: */*" -H "User-Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 11_1_2 like Mac OS X) AppleWebKit/604.3.5 (KHTML, like Gecko) Mobile/15B202 NETGEAR/v1 (iOS Vuezone)" -H "Content-length: 0"
> OPTIONS /api/auth HTTP/1.1
> Host: ocapi-app.arlo.com
> Connection: keep-alive
> Accept-Encoding: gzip, deflate
> Accept: */*
> User-Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 11_1_2 like Mac OS X) AppleWebKit/604.3.5 (KHTML, like Gecko) Mobile/15B202 NETGEAR/v1 (iOS Vuezone)
> Content-length: 0
> 
< HTTP/1.1 204 No Content
purarue commented 3 years ago

Will see if I can try and find similarities/switch to python2.7 and see if that makes any difference when I debug this further, probably this weekend

jeffreydwalter commented 3 years ago

Any luck?

purarue commented 3 years ago

Cant tell if an immediate switch to python2.7 does anything -- cant test with httpx since that requires 3.6. But running on 2.7 I get the same error as on 3.9.4

Modified the script to work on both versions:

#!/usr/bin/env python3

from __future__ import print_function
import sys

import requests

if sys.version_info.major == 3:
    import httpx

def extract_status(obj):
    if hasattr(obj, "status"):
        return obj.status
    if hasattr(obj, "status_code"):
        return obj.status_code
    raise TypeError("unsupported request object")

def make_request(using_verb, url, headers):

    resp = using_verb(url, headers=headers)
    status = extract_status(resp)

    print(str(resp.json()))

    if status == 200:
        print("succeeded")
    else:
        print("failed")
        resp.raise_for_status()

def main():
    # see https://github.com/seanbreckenridge/pygithub_requests_error for token scopes

    auth_token = "token here"

    headers = {
        "Authorization": "token {}".format(auth_token),
        "User-Agent": "requests_error",
        "Accept": "application/vnd.github.v3+json",
    }

    # replace this with a URL you have access to

    url = "https://api.github.com/repos/seanbreckenridge/albums/traffic/clones"

    if sys.version_info.major == 3:
        make_request(httpx.get, url, headers)
    make_request(requests.get, url, headers)

if __name__ == "__main__":
    main()

On 2.7, installed using the AUR requests package:

python2.7 -m requests.help
{
  "chardet": {
    "version": "4.0.0"
  },
  "cryptography": {
    "version": ""
  },
  "idna": {
    "version": "2.10"
  },
  "implementation": {
    "name": "CPython",
    "version": "2.7.18"
  },
  "platform": {
    "release": "5.12.1-arch1-1",
    "system": "Linux"
  },
  "pyOpenSSL": {
    "openssl_version": "",
    "version": null
  },
  "requests": {
    "version": "2.25.1"
  },
  "system_ssl": {
    "version": "101010bf"
  },
  "urllib3": {
    "version": "1.26.4"
  },
  "using_pyopenssl": false
}
$ python2.7 main.py
{u'documentation_url': u'https://docs.github.com/rest/reference/repos#get-repository-clones', u'message': u'Must have push access to repository'}
failed
Traceback (most recent call last):
  File "main.py", line 56, in <module>
    main()
  File "main.py", line 52, in main
    make_request(requests.get, url, headers)
  File "main.py", line 31, in make_request
    resp.raise_for_status()
  File "/usr/lib/python2.7/site-packages/requests/models.py", line 943, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://api.github.com/repos/seanbreckenridge/albums/traffic/clones

On 3.9:

$ python3 main.py
{'count': 2, 'uniques': 2, 'clones': [{'timestamp': '2021-05-06T00:00:00Z', 'count': 1, 'uniques': 1}, {'timestamp': '2021-05-07T00:00:00Z', 'count': 1, 'uniques': 1}]}
succeeded
{'message': 'Must have push access to repository', 'documentation_url': 'https://docs.github.com/rest/reference/repos#get-repository-clones'}
failed
Traceback (most recent call last):
  File "/home/sean/Repos/requests_test/main.py", line 56, in <module>
    main()
  File "/home/sean/Repos/requests_test/main.py", line 52, in main
    make_request(requests.get, url, headers)
  File "/home/sean/Repos/requests_test/main.py", line 31, in make_request
    resp.raise_for_status()
  File "/usr/lib/python3.9/site-packages/requests/models.py", line 943, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://api.github.com/repos/seanbreckenridge/albums/traffic/clones

So I'd guess your error is something different?

Will try and inspect the request info to see if theres anything different there...

jeffreydwalter commented 3 years ago

@seanbreckenridge try downgrading your urllib to 1.24. I was able to work around my 403 issue using Python 3.x that way.

purarue commented 3 years ago
[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"

[packages]
httpx = "*"
requests = "*"
urllib3 = "==1.24"

[dev-packages]

[requires]
python_version = "3.9"
httpx==0.18.1
  - certifi [required: Any, installed: 2021.5.30]
  - httpcore [required: >=0.13.0,<0.14.0, installed: 0.13.3]
    - h11 [required: >=0.11,<0.13, installed: 0.12.0]
    - sniffio [required: ==1.*, installed: 1.2.0]
  - rfc3986 [required: >=1.3,<2, installed: 1.5.0]
  - sniffio [required: Any, installed: 1.2.0]
requests==2.25.1
  - certifi [required: >=2017.4.17, installed: 2021.5.30]
  - chardet [required: >=3.0.2,<5, installed: 4.0.0]
  - idna [required: >=2.5,<3, installed: 2.10]
  - urllib3 [required: >=1.21.1,<1.27, installed: 1.24]

Created a pipenv with 1.24; ran the same script above, doesn't seem to fix my issue. So the issues we had were probably separate

$ pipenv run python3 main.py
{'count': 12, 'uniques': 12, 'clones': [{'timestamp': '2021-05-21T00:00:00Z', 'count': 7, 'uniques': 7}, {'timestamp': '2021-05-23T00:00:00Z', 'count': 1, 'uniques': 1}, {'timestamp': '2021-05-30T00:00:00Z', 'count': 1, 'uniques': 1}, {'timestamp': '2021-06-01T00:00:00Z', 'count': 1, 'uniques': 1}, {'timestamp': '2021-06-02T00:00:00Z', 'count': 2, 'uniques': 2}]}
succeeded
{'message': 'Must have push access to repository', 'documentation_url': 'https://docs.github.com/rest/reference/repos#get-repository-clones'}
failed
Traceback (most recent call last):
  File "/home/sean/Repos/requests_test/main.py", line 56, in <module>
    main()
  File "/home/sean/Repos/requests_test/main.py", line 52, in main
    make_request(requests.get, url, headers)
  File "/home/sean/Repos/requests_test/main.py", line 31, in make_request
    resp.raise_for_status()
  File "/home/sean/.local/share/virtualenvs/requests_test-YPmFob8P-python/lib/python3.9/site-packages/requests/models.py", line 943, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://api.github.com/repos/seanbreckenridge/albums/traffic/clones
davedeaven commented 3 years ago

I had the same issue in the gpodder app which uses requests, in this case a podcast CDN was issuing 403 for some HTTP requests, which are normally CDN redirects. I am not sure why, but as a workaround I found that using the underlying PreparedRequest object and session send() it works.

I'm on version: 2.24.0-lp152.3.3.1

To reproduce (at least for a week or two while this podcast URL is valid):

Python 3.6.12 (default, Dec 02 2020, 09:44:23) [GCC] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>> url = 'https://open.live.bbc.co.uk/mediaselector/6/redir/version/2.0/mediaset/audio-nondrm-download-low/proto/http/vpid/p09s18sm.mp3'
>>> s = requests.Session()
>>> r = s.get(url)
>>> r
<Response [403]>
>>> r.request.headers
{'User-Agent': 'python-requests/2.24.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'Authorization': 'Basic YW5vbnltb3VzOmRlYXZlbkBkZWF2ZW4ubmV0'}

>>> req = requests.Request('GET', url)
>>> prep_req = req.prepare()
>>> resp = s.send(prep_req)
>>> resp
<Response [200]>
>>> resp.request.headers
{'Authorization': 'Basic YW5vbnltb3VzOmRlYXZlbkBkZWF2ZW4ubmV0'}
>>> s.headers
{'User-Agent': 'python-requests/2.24.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
sigmavirus24 commented 3 years ago
Python 3.6.12 (default, Dec 24 2020, 11:04:11)
[GCC 10.2.1 20201125 (Red Hat 10.2.1-9)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>> url = 'https://open.live.bbc.co.uk/mediaselector/6/redir/version/2.0/mediaset/audio-nondrm-download-low/proto/http/vpid/p09s18sm.mp3'
>>> s = requests.Session()
>>> r = s.get(url)
>>> r
<Response [200]>
>>> r.request.headers
{'User-Agent': 'python-requests/2.26.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
>>> s.headers
{'User-Agent': 'python-requests/2.26.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
>>> req = requests.Request('GET', url)
>>> prep_req = req.prepare()
>>> resp = s.send(prep_req)
>>> resp
<Response [200]>
>>> resp.request.headers
{}
>>> s.headers
{'User-Agent': 'python-requests/2.26.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
>>>

Once again, this is not reproducible for me.

@davedeaven you did make me wonder though, what happens if you do s.trust_env = False first? Also do you have a .netrc that's supplying your authorization headers here or did you leave out code? Finally, when you see the 403, what does r.history show you?

davedeaven commented 3 years ago

@sigmavirus24, I tried the s.trust_env, it works.

>>> import requests
>>> url = 'https://open.live.bbc.co.uk/mediaselector/6/redir/version/2.0/mediaset/audio-nondrm-download-low/proto/http/vpid/p09s18sm.mp3'
>>> s = requests.Session()
>>> s.trust_env = False
>>> r = s.get(url)
>>> r
<Response [200]>
>>> r.history
[<Response [302]>, <Response [302]>]
>>> s.trust_env = True
>>> r = s.get(url)
>>> r
<Response [403]>
>>> r.history
[]

And you are correct, I do have a ~/.netrc of the form default login anonymous password deaven@deaven.net If I remove this, then the request is successful even without the s.trust_env setting. So that is the cause... this solves my issue because I do not need the ~/.netrc file, it was left over from an earlier era. Although it does seem to be somewhat unexpected behavior that the prepared request works differently, and I will note that in the same environment, tools like wget (and browsers) work fine to retrieve this URL

Really appreciate the fast response on this, thank you!

purarue commented 3 years ago

Seems to be the same issue I had - I also had a .netrc file which had information to login to github like

machine api.github.com
    login seanbreckenridge
    password something_here

Removing that file or setting the trust_env fixes my issue

sigmavirus24 commented 3 years ago

Yeah, I'll be completely transparent, I hate that we use as much of the environment as we do (.netrc, REQUESTS_CA_BUNDLE, CURL_CA_BUNDLE, etc.)