requests / toolbelt

A toolbelt of useful classes and functions to be used with python-requests
https://toolbelt.readthedocs.org
Other
995 stars 183 forks source link

Proxy Digest Authentication fails with https sites when using HTTPProxyDigestAuth helper #136

Open spectrumjade opened 8 years ago

spectrumjade commented 8 years ago

Proxy Digest authentication seems to work fine with unencrypted http, but https requests (which are made using a CONNECT tunnel through the proxy) fail with an exception.

How to reproduce (I'm using requests 2.6.0 and requests-toolbelt 0.6.0):

$ python
Python 2.6.6 (r266:84292, Jul 22 2015, 16:47:47)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-16)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>> from requests_toolbelt.auth.http_proxy_digest import HTTPProxyDigestAuth
>>> proxies = { 'http': 'http://proxy:3128', 'https': 'https://proxy:3128' }  # Note that it makes no difference if the https protocol proxy is specified as http://proxy or https://proxy
>>> auth = HTTPProxyDigestAuth('username', 'password')
>>> r = requests.get('http://website', proxies=proxies, auth=auth)
>>> r.status_code
200
>>> r = requests.get('https://website', proxies=proxies, auth=auth)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.6/site-packages/requests/api.py", line 68, in get
    return request('get', url, **kwargs)
  File "/usr/lib/python2.6/site-packages/requests/api.py", line 50, in request
    response = session.request(method=method, url=url, **kwargs)
  File "/usr/lib/python2.6/site-packages/requests/sessions.py", line 464, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python2.6/site-packages/requests/sessions.py", line 576, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python2.6/site-packages/requests/adapters.py", line 424, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='website', port=443): Max retries exceeded with url: / (Caused by ProxyError('Cannot connect to proxy.', error('Tunnel connection failed: 407 Proxy Authentication Required',)))

The 407 response to the CONNECT request should be hooked in the same fashion as unencrypted requests.

What's curious is that the documentation includes an example with an https site.

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/31467512-proxy-digest-authentication-fails-with-https-sites-when-using-httpproxydigestauth-helper?utm_campaign=plugin&utm_content=tracker%2F418367&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F418367&utm_medium=issues&utm_source=github).
Lukasa commented 8 years ago

This problem actually comes out of httplib, and it represents a very real limitation of the httplib ecosystem (of which requests is a part).

When creating a tunnel, httplib calls into its _tunnel() method when we attempt to connect. The problem here, as you can see in that code, is that the 407 response never makes it out of httplib: it doesn't even try to parse the headers. That means we can't easily find the 407 challenge header.

Changing this behaviour is possible, but requires urllib3 doing yet more to work around httplib, which I'm increasingly uncomfortable with. Already it's extremely difficult to replace httplib inside urllib3 because urllib3 already knows a great deal about httplib's internal implementation details: I'm highly reluctant to add more.

sigmavirus24 commented 8 years ago

Thanks for that explanation @Lukasa

sylencecc commented 8 years ago

I'm currently facing the same situation and also found about the httplib issue, but couldn't even think of any workaround. Are there even any python HTTP/S libraries that properly support multiple proxy authentication methods?

Lukasa commented 8 years ago

@sylencecc httplib2 might, by virtue of not being built on top of httplib. Otherwise, proxies are pretty poorly supported I'm afraid. =(

spectrumjade commented 8 years ago

I know that curl supports this authentication and therefore pycurl probably does as well. Unfortunately pycurl isn't as elegant.

yan12125 commented 6 years ago

Now urllib3 v2 is coming, which is going to drop httplib, so let's make it possible!

With my 3 patches:

the following test case runs fine:

import requests
from requests_toolbelt.auth.http_proxy_digest import HTTPProxyDigestAuth

proxy = 'http://ip:port'
proxies = {
    "http": proxy,
    "https": proxy,
}

def req(url):
    auth = HTTPProxyDigestAuth("user", "pass")
    r = requests.get(url, proxies=proxies, auth=auth)
    print(r.json())

req("http://httpbin.org/ip")
req("https://httpbin.org/ip")
redbaron4 commented 3 weeks ago

This problem actually comes out of httplib, and it represents a very real limitation of the httplib ecosystem (of which requests is a part).

When creating a tunnel, httplib calls into its _tunnel() method when we attempt to connect. The problem here, as you can see in that code, is that the 407 response never makes it out of httplib: it doesn't even try to parse the headers. That means we can't easily find the 407 challenge header.

Changing this behaviour is possible, but requires urllib3 doing yet more to work around httplib, which I'm increasingly uncomfortable with. Already it's extremely difficult to replace httplib inside urllib3 because urllib3 already knows a great deal about httplib's internal implementation details: I'm highly reluctant to add more.

This is a very old but still relevant issue.

http.client in Python3.12+ has made certain changes which preserves headers information even in case of connection failing. _tunnel() still raises OSError but headers can be accessed via self.get_proxy_response_headers() if the OSError can be caught in calling function (urllib3.connection.HTTPSClient). I think the changes to be made to urllib3/requests are now much smaller.