psf / requests

A simple, yet elegant, HTTP library.
https://requests.readthedocs.io/en/latest/
Apache License 2.0
52.17k stars 9.33k forks source link

Making https proxies easier to deal with #2061

Open asmeurer opened 10 years ago

asmeurer commented 10 years ago

I'm trying to add support for proxies to conda, in particular, automatically prompting for a username and password on a 407. For http proxies, it's all fine. I can detect the 407 error code on HTTPError from raise_for_status and inject HTTPProxyAuth and try again.

But for https, it raises ConnectionError on get. Adding HTTPProxyAuth does not work. I have to parse the URL and add it in as https://username:password@proxy:port. urlparse does not make this particularly easy.

Furthermore, the only way I can tell to actually detect the the ConnectionError is a 407 is to do

try:
    # do request
except ConnectionError as e:
    if "407" in str(e):
        # get username and password
Lukasa commented 10 years ago

This was briefly discussed in IRC, as seen here. This GitHub issue encompasses a number of problems. Let's enumerate them.

  1. HTTPProxyAuth doesn't work for tunneling proxies. Yes, that's true, and that's because tunneling proxies are fundamentally a very different beast to your standard proxy, involving the CONNECT request and all kinds of funky nonsense. Our auth paradigm doesn't map to this special case.

    My proposal here is actually to get rid of HTTPProxyAuth altogether. I hate it. We have had auth-in-the-proxy-url for a long time now, so it's not like it's the only way to do it. Additionally, it provides better conceptual abstraction: all auth handlers now apply auth to the origin, not to intermediaries. I like that distinction.

  2. If you aren't authed to a tunneling proxy, you receive a ConnectionError on the method call rather than getting the 407 back. This is again because of the way httplib is designed for tunneling proxies. I think we can improve this situation by getting urllib3 to throw an appropriate exception which we catch and wrap. We can't move the exception to .raise_for_status() because that would require a httplib rewrite (or something truly evil), but at least we can make it easier to work out what happened.
Lukasa commented 10 years ago

@shazow, are you open to doing part 2?

asmeurer commented 10 years ago

Well, in this case, the response attribute of the ConnectionError is None. If it were the same as with HTTPError, I could just handle them uniformly.

Lukasa commented 10 years ago

I don't think it can be, logistically: we don't have a response in hand. This is all handled transparently by httplib. Because it's terrible.

asmeurer commented 10 years ago

Another possibly unrelated issue (I can open a new issue). If you mutate the proxies attribute of a Session object, the new value isn't used. So if you want to redo the request with the fixed proxies ("fixed" meaning including authentication credentials), you have to do

def dostuff(url, session=None):
     session = session or Session() # Actually a custom Session subclass
     ...
     try:
        resp = session.get(url, proxies=session.proxies) # I shouldn't have to include the proxies in get() here; they are already in session.proxies!
        resp.raise_for_status
     except HTTPError as e:
        if e.response.status_code == 407:
            # Get username and password and put it on the proxy url
            session.proxies['http'] = fixes_proxy_url
            # Try again
            dostuff(url, session=session)
asmeurer commented 10 years ago

And more to the point, that's a lot of logic to repeat every time I do a get (it's actually more, because I also have to check for ConnectionError). It would be nice if requests just had a way to automatically prompt the user for proxy authentication and retry the request.

But I'll be happy to just get these basic issues ironed out first.

Lukasa commented 10 years ago

Uh...as far as I can see on a quick code read, we absolutely pick up the proxy settings from the Session. What makes you think we're not?

As for 'prompting' for proxy auth, that's not going to happen, it's simply not requests' job.

sigmavirus24 commented 10 years ago

Uh...as far as I can see on a quick code read, we absolutely pick up the proxy settings from the Session. What makes you think we're not?

My guess would be that once we have made a request with a proxy, e.g., we've used the http proxy, then if @asmeurer changes the list of proxies it isn't fixed because we're using the same HTTPAdapter which hits lines 209 through 215. Notice that if http is already in self.proxy_manager we don't do anything. Since it is, the new proxy is ignored. I haven't attempted to test if that's in fact the case, but that's my first guess as to what might be causing the behaviour that @asmeurer is seeing.

As for 'prompting' for proxy auth, that's not going to happen, it's simply not requests' job.

I agree.

asmeurer commented 10 years ago

As for 'prompting' for proxy auth, that's not going to happen, it's simply not requests' job.

That's unfortunate. It seems that a lot of http libraries take this view, which is why every application that connects to the internet has to implement its own proxy authentication (I don't use proxies, but I've noticed this pattern).

Any thoughts on how requests could make this easier? A callback mechanism?

Lukasa commented 10 years ago

Requests should absolutely allow you to apply auth if you hadn't before.

As for making it easier to make the same request with auth for the proxy, the reason HTTP libraries generally don't is because we don't know what people are going to want. In requests case, this is sufficiently unusual that adding a mechanism would have two effects:

  1. Disappointing the people who wanted another way to add the proxy auth if they needed to.
  2. Require us to maintain a code path used by a tiny fraction of our users.

For that reason, we assume that users who need to re-make a request will do it themselves.

The auth stuff makes that harder, so we should fix it, but otherwise I think that's the end of it.

sigmavirus24 commented 10 years ago

So I want to be sure I'm understanding everyone's concerns here properly:

@asmeurer when you say you'd like requests to prompt for proxy auth, do you mean you'd like us to literally use raw_input (or input in the case of Python 3)? I'm pretty sure that's not what you want and that's not something we'll ever support. Further, I'm not quite certain how we would properly implement a callback system for this particular case since the only other system like it in requests relies on having a response which we don't have in this case.

That said, we've had a troubled history (which @Lukasa knows much better than I) dealing with HTTPS Proxy Authentication. If there were a better way of handling them, we would have already implemented it (I'd like to think).

This discussion should remain in this issue. Your other problem @asmeurer (in which mutating a Session's proxies does not affect subsequent requests should be a separate issue). I'm trying to think of a good way to handle that case since I think I've located the problem above.

asmeurer commented 10 years ago

Sure I'll open a new issue for the Session thing.

asmeurer commented 10 years ago

https://github.com/kennethreitz/requests/issues/2063

Lukasa commented 10 years ago

The situation with HTTPS proxy authentication remains like this:

I appreciate this is unfortunate, but there's simply no way around it: it needs to be done the way it's currently being done, or we need to special-case the Proxy Auth handler to mean something special. I don't want to do that because the Proxy Auth handler is semantically out of place: it applies authentication to the proxy, not to the origin. Given that you may also want to authenticate the origin, and that we don't allow multiple auth handlers, applying the Proxy Auth handler impedes your ability to do any other kind.

Best to have proxy authentication credentials come in on the proxies dictionary.

asmeurer commented 10 years ago

Ok, I agree that this is the wrong place for it. ProxyAuth did seem strange to me, for the reason you cited.

shazow commented 10 years ago

Not sure if this discussion is resolved or not, but please ping me again if I still have an action item/question. I will be home tomorrow for more in-depth reading. :)

@Lukasa If this is still an open question, handling special-handling 407 in urllib3 sounds sensible if we can do it in a low-impact way.

asmeurer commented 10 years ago

My understanding of what is going on is limited here, but it didn't seem to me like there were any concrete action items yet.

Lukasa commented 10 years ago

Yeah, we're still trying to drill down into exactly what's happening where.

ary-ber commented 3 years ago

Hello guys. I tried to look around in the repo to find some answers but wasn't lucky. Is there any workaround to do a get request using a HTTPS proxy ?

I'm getting this error

requests.exceptions.ProxyError: HTTPSConnectionPool(host='httpbin.org', port=443): Max retries exceeded with url: /ip (Caused by ProxyError('Cannot connect to proxy.', NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x0000021DB2CDF668>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond',)))

and this is what I'm trying to do to test.

import requests
url = 'https://httpbin.org/ip'
prox = 'https://89.36.195.238:35328'

proxies = {
    "https": prox
}
response = requests.get(url,proxies=proxies)
print(response.json())