python / cpython

The Python programming language
https://www.python.org
Other
63.44k stars 30.38k forks source link

http.client.HTTPConnection tunneling is broken #52024

Closed 92259bf3-4b4c-47c1-a468-08a70c584615 closed 10 years ago

92259bf3-4b4c-47c1-a468-08a70c584615 commented 14 years ago
BPO 7776
Nosy @warsaw, @orsenthil, @larryhastings, @benjaminp, @ned-deily, @cameron-simpson, @ethanfurman, @vadmium, @dstufft
Files
  • issue7776_rev1.patch: Patch, revision 1
  • issue7776_rev2.patch: Patch, revision 2
  • issue7776_rev3.patch: Patch, revision 3
  • issue7776_rev4.patch: Patch, revision 4
  • issue7776_rev5.patch
  • issue7776_rev6.patch
  • issue7776_r7_Py3.4.diff
  • issue7776_r7_Py3.5.diff
  • 7776-2.7.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = 'https://github.com/orsenthil' closed_at = created_at = labels = ['type-bug', 'library', 'release-blocker'] title = 'http.client.HTTPConnection tunneling is broken' updated_at = user = 'https://github.com/cameron-simpson' ``` bugs.python.org fields: ```python activity = actor = 'orsenthil' assignee = 'orsenthil' closed = True closed_date = closer = 'larry' components = ['Library (Lib)'] creation = creator = 'cameron' dependencies = [] files = ['33357', '33358', '33431', '33520', '33738', '33761', '34802', '34803', '35267'] hgrepos = [] issue_num = 7776 keywords = ['patch'] message_count = 43.0 messages = ['98264', '98266', '98268', '98311', '98312', '98401', '190052', '207658', '207661', '207663', '207976', '208382', '209382', '209397', '209494', '209500', '216031', '216046', '216100', '216114', '216117', '218197', '218204', '218212', '218286', '218294', '218295', '218297', '218299', '218355', '218685', '218688', '218689', '218691', '218692', '218693', '218694', '218695', '218696', '218698', '218724', '218728', '218729'] nosy_count = 12.0 nosy_names = ['barry', 'orsenthil', 'larry', 'benjamin.peterson', 'ned.deily', 'cameron', 'nikratio', 'ethan.furman', 'python-dev', 'martin.panter', 'dstufft', 'Cybjit'] pr_nums = [] priority = 'release blocker' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue7776' versions = ['Python 2.7', 'Python 3.4', 'Python 3.5'] ```

    92259bf3-4b4c-47c1-a468-08a70c584615 commented 14 years ago

    I'm trying to do HTTPS via a proxy in Python 2.6.4 (which is supposed to incorporate this fix from bpo-1424152).

    While trying to debug this starting from the suds library I've been reading httplib.py and urllib2.py to figure out what's going wrong and found myself around line 687 of httplib.py at the _tunnel() function.

    _tunnel() is broken because _set_hostport() has side effects.

    _tunnel() starts with: self._set_hostport(self._tunnel_host, self._tunnel_port) to arrange that the subsequent connection is made to the proxy host and port, and that is in itself ok.

    However, _set_hostport() sets the .host and .port attributes in the HTTPConnection object.

    The next action _tunnel() takes is to send the CONNECT HTTP command, filling in the endpoint host and port from self.host and self.port. But these values have been overwritten by the preceeding _set_hostport() call, and so we ask the proxy to connect to itself.

    It seems to me that _tunnel() should be grabbing the original host and port before calling _set_hostport(), thus:

    ohost, oport = self.host, self.port self._set_hostport(self._tunnel_host, self._tunnel_port) self.send("CONNECT %s:%d HTTP/1.0\r\n\r\n" % (ohost, oport))

    In fact the situation seems even worse: _tunnel() calls send(), send() calls connect(), and connect() calls _tunnel() in an infinite regress.

    92259bf3-4b4c-47c1-a468-08a70c584615 commented 14 years ago

    Amendment: regarding the infinite regress, it looks like there will not be a recursion if the caller leaps straight to the .connect() method. However, if they do that then the call to _tunnel() from within connect() will happen _after_ the socket is made directly to the origin host, not via the proxy. So the behaviour seems incorrect then also; it looks very much like _tunnel() must always be called before the real socket connection is established, and .connect() calls _tunnel() afterwards, not before.

    92259bf3-4b4c-47c1-a468-08a70c584615 commented 14 years ago

    It's looking like I have my idea of .host versus ._tunnel_host swapped. I think things are still buggy, but my interpretation of the bug is wrong or misleading.

    I gather that after _set_tunnel(), .host is the proxy host and that ._tunnel_host is the original target host.

    I'll follow up here in a bit when I've better characterised the problem. I think I'm letting urllib2's complicated state stuff confuse me too...

    92259bf3-4b4c-47c1-a468-08a70c584615 commented 14 years ago

    Well, I've established a few things:

    To the first item: _tunnel() feels really fragile with that recursion issue, though it doesn't recurse called from urllib2.

    For the second, here's my test script using httplib:

      H = httplib.HTTPSConnection("localhost", 3128)
      print H
      H._set_tunnel("localhost", 443)
      H.request("GET", "/boguspath")
      os.system("lsof -p %d | grep IPv4" % (os.getpid(),))
      R = H.getresponse()
      print R.status, R.reason

    As you can see, one builds the HTTPSConnection object with the proxy's details instead of those of the target URL, and then put the target URL details in with _set_tunnel(). Am I alone in find this strange?

    For the third, my test code is this:

      U = urllib2.Request('https://localhost/boguspath')
      U.set_proxy('localhost:3128', 'https')
      f = urllib2.urlopen(R)
      print f.read()

    which fails like this:

      Traceback (most recent call last):
        File "thttp.py", line 15, in <module>
          f = urllib2.urlopen(R)
        File "/opt/python-2.6.4/lib/python2.6/urllib2.py", line 131, in urlopen
          return _opener.open(url, data, timeout)
        File "/opt/python-2.6.4/lib/python2.6/urllib2.py", line 395, in open
          protocol = req.get_type()
      AttributeError: HTTPResponse instance has no attribute 'get_type'

    The line numbers are slightly off because I've got some debugging statements in there.

    Finally, I flat out do not understand urllib2's set_proxy() method:

    def set_proxy(self, host, type):
        if self.type == 'https' and not self.\_tunnel_host:
            self.\_tunnel_host = self.host
        else:
            self.type = type
            self.\_\_r_host = self.\_\_original
        self.host = host

    When my code calls set_proxy, self.type is None. Now, I had naively expected the first branch to be the only branch. Could someone explain what's happening here, and what is meant to happen?

    I'm thinking that this bug may turn into a doc fix instead of a behaviour fix, but I'm finding it surprisingly hard to know how urllib2 is supposed to be used.

    orsenthil commented 14 years ago

    As you noticed, the _set_tunnel method is a private method not intended to be used directly. Its being used by urllib2 when https through proxy is required. urllib2 works like this, it reads HTTPS_PROXY environment variable (in turn includes HTTPSProxyHandler and HTTPSProxyAuthenticationHandler) and then try to do a urlopen on an https:// url or a request object through the tunnel doing a CONNECT instead of a GET.

    How do think the docs can be improved? If you have any suggestions please upload a patch. Thanks.

    92259bf3-4b4c-47c1-a468-08a70c584615 commented 14 years ago

    Well, following your description I've backed out my urllib2 test case to this:

      f = urllib2.urlopen('https://localhost/boguspath')
      os.system("lsof -p %d | grep IPv4" % (os.getpid(),))
      f = urllib2.urlopen(R)
      print f.read()

    and it happily runs HTTPS through the proxy if I set the https_proxy envvar. So it's all well and good for the "just do what the environment suggests" use case.

    However, my older test:

      U = urllib2.Request('https://localhost/boguspath')
      U.set_proxy('localhost:3128', 'https')
      f = urllib2.urlopen(R)
      print f.read()

    still blows up with:

    File "/opt/python-2.6.4/lib/python2.6/urllib2.py", line 381, in open protocol = req.get_type() AttributeError: HTTPResponse instance has no attribute 'get_type'

    Now, this is the use case for "I have a custom proxy setup for this activity".

    It seems a little dd that "req" above is an HTTPResponse instead of a Request, and that my be why there's no .ettype() method available.

    I also see nothing obviously wrong with my set_proxy() call above based on the docs for the .set_proxy() method, though obviously it fails.

    I think what may be needed is a small expansion of the section in the Examples are on proxies. There's an description of the use of the *_proxy envvars there (and not elsewhere, which seems wrong) and an example of providing a proxy Handler. An addition example with a functioning use of a bare .set_proxy() might help.

    83d2e70e-e599-4a04-b820-3814bbdb9bef commented 11 years ago

    Cameron could you provide a patch for this?

    7cb04df6-5f46-4078-bf67-48262ccd0118 commented 10 years ago

    Ok, this is a bit of a mess.

    Issue bpo-11448 contains a patch for the documentation. But the functionality itself is still not working.

    With the (I believe) intended usage, we have:

    >> import http.client >> conn = http.client.HTTPSConnection("localhost",8080) >> conn.set_tunnel("www.python.org") >> conn.request("HEAD","/index.html")

    What happens then is this:

    7cb04df6-5f46-4078-bf67-48262ccd0118 commented 10 years ago

    I have attached a patch that should fix the issue.

    7cb04df6-5f46-4078-bf67-48262ccd0118 commented 10 years ago

    New patch revision, this time includes unit tests.

    7cb04df6-5f46-4078-bf67-48262ccd0118 commented 10 years ago

    I've updated the patch again to fix a problem with HTTPSConnection.

    7cb04df6-5f46-4078-bf67-48262ccd0118 commented 10 years ago

    Rebased patch on current tip.

    benjaminp commented 10 years ago

    A few comments

    7cb04df6-5f46-4078-bf67-48262ccd0118 commented 10 years ago

    Thanks for the review! I've attached an updated patch.

    An update to the set_tunnel library documentation is being discussed in bpo-11448. I'm not sure how to best handle the overlap. Maybe the best way is to first deal with bpo-11448, and then add the changes resulting from this issue?

    7cb04df6-5f46-4078-bf67-48262ccd0118 commented 10 years ago

    Hmm. I think I found another problem... please wait for another patch revision.

    7cb04df6-5f46-4078-bf67-48262ccd0118 commented 10 years ago

    Ok, I've attached yet another patch revision.

    This revision is less complex, because it gets rid of the ability to set up chains of tunnels. The only reason that I put that in was to preserve backward compatibility -- but upon reviewing the old implementation again, I found out that this actually did not work in the past either.

    7cb04df6-5f46-4078-bf67-48262ccd0118 commented 10 years ago

    Refreshed patch.

    orsenthil commented 10 years ago

    I am reviewing this patch right now and you will see my action soon. It is completely and I am reviewing to validating the technical details/fix. Thanks for patch, Nikolaus.

    orsenthil commented 10 years ago

    I verified the patch and this indeed corrects a nasty bug in sending a wrong header when doing it a lower level HTTPSConnection to proxy and set_tunnel (bad term) to the end host..I was worried as why we did not observe this earlier and it seems to me that the advertised way to do HTTPS CONNECT is via Proxy Handler or urllib.request and when doing it via a ProxyHandler, these wierdly named action (set_tunnel) happen underneath, but the skip_hosts bit is set as we got headers from the higher level method. and the host header is carried transparently to the tunnel connection request and thus we escaped this.

    The patch fixes the problem and cleans up a bit. Thanks for that , Nikolaus.

    This code (http/client.py) will require more attention beyond this bug too.

    1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 10 years ago

    New changeset 39ee3286d187 by Senthil Kumaran in branch '3.4': Issue bpo-7776: Fix ``Host:'' header and reconnection when using http.client.HTTPConnection.set_tunnel(). http://hg.python.org/cpython/rev/39ee3286d187

    New changeset 2c9af09ba7b8 by Senthil Kumaran in branch 'default': merge from 3.4 http://hg.python.org/cpython/rev/2c9af09ba7b8

    orsenthil commented 10 years ago

    This is fixed in 3.4 and 3.5 (finally). I will port it to 2.7 as well.

    8959fa4f-1116-498b-9085-f7b67c1eb55e commented 10 years ago

    I get errors when using pip with a proxy in 3.4.1rc1 on Windows, that does not happen on 3.4.0. I tracked it down to this change to client.py. OK with client.py from fd2c69cedb25, but not with client.py from 39ee3286d187.

    C:\Python34\Scripts>set HTTP_PROXY=http://openwrt.lan:8888

    C:\Python34\Scripts>set HTTPS_PROXY=http://openwrt.lan:8888

    C:\Python34\Scripts>pip -v install simplejson Downloading/unpacking simplejson Could not fetch URL https://pypi.python.org/simple/simplejson/: connection err or: hostname 'openwrt.lan' doesn't match either of '*.c.ssl.fastly.net', 'c.ssl. fastly.net', '*.target.com', '*.vhx.tv', '*.snappytv.com', '*.atlassian.net', 'p laces.hoteltonight.com', 'secure.lessthan3.com', '*.atlassian.com', 'a.sellpoint .net', 'cdn.upthere.com', '*.tissuu.com', '*.issuu.com', '*.kekofan.com', '*.pyt hon.org', '*.theverge.com', '*.sbnation.com', '*.polygon.com', '*.twobrightlight s.com', '*.2brightlights.info', '*.vox.com', 'staging-cdn.upthere.com', '*.zeebo x.com', '*.beamly.com', '*.aticpan.org', 'stream.svc.7digital.net', 'stream-test .svc.7digital.net', '*.articulate.com', 's.t.st', 'vid.thestreet.com', '*.planet -labs.com', '*.url2png.com', 'turn.com', 'www.turn.com', 'rivergathering.org', ' social.icfglobal2014-europe.org', '*.innogamescdn.com', '*.pathable.com', '*.sta ging.pathable.com', '*.kickstarter.com', 'sparkingchange.org', 'www.swedavia.se' , 'www.swedavia.com', 'js-agent.newrelic.com', '*.fastly-streams.com', 'cdn.bran disty.com', 'fastly.hightailcdn.com', '*.fl.yelpcdn.com', '*.feedmagnet.com', 'a pi.contentbody.com', '*.acquia.com', '*.swarmapp.com', '*.lonny.com', '*.stylebi stro.com', '*.zimbio.com', '*.pypa.io', 'pypa.io', 'static.qbranch.se', '*.krxd. net', '*.room.co', '*.metrological.com', 'room.co', 'www.ibmserviceengage.com', 'my.ibmserviceengage.com', 'cdn.evbuc.com', 'cdn.adagility.com' Will skip URL https://pypi.python.org/simple/simplejson/ when looking for down load links for simplejson

    7cb04df6-5f46-4078-bf67-48262ccd0118 commented 10 years ago

    On 05/09/2014 02:02 PM, Cybjit wrote:

    C:\Python34\Scripts>pip -v install simplejson Downloading/unpacking simplejson Could not fetch URL https://pypi.python.org/simple/simplejson/: connection err or: hostname 'openwrt.lan' doesn't match either of '*.c.ssl.fastly.net', 'c.ssl.

    This looks as if pip tries to match the hostname in the certificate from pypi.python.org against the hostname of the local proxy. Looking at the code, I don't see why it would do that though. HTTPSConnection.connect definitely tries to match against the final hostname.

    Is pip maybe doing its own certificate check, and relying on HTTPSConnection.host to contain the final hostname rather than the proxy?

    8959fa4f-1116-498b-9085-f7b67c1eb55e commented 10 years ago

    On 2014-05-10 00:23, nikratio wrote:

    Is pip maybe doing its own certificate check, and relying on HTTPSConnection.host to contain the final hostname rather than the proxy?

    I think the culprit might be here https://github.com/pypa/pip/blob/1.5.4/pip/_vendor/requests/packages/urllib3/connection.py#L172

    7cb04df6-5f46-4078-bf67-48262ccd0118 commented 10 years ago

    Cybjit \report@bugs.python.org\ writes:

    Cybjit added the comment:

    On 2014-05-10 00:23, nikratio wrote: > Is pip maybe doing its own certificate check, and relying on > HTTPSConnection.host to contain the final hostname rather than the proxy?

    I think the culprit might be here https://github.com/pypa/pip/blob/1.5.4/pip/_vendor/requests/packages/urllib3/connection.py#L172

    Yes, that's the problem. I guess that nicely demonstrates why using inheritance as an API is not a good idea.

    I guess we nevertheless need to repair/work around this in Python 3.4? Unfortunately pip explicitly relies on _set_tunnel() to set self.host = self._tunnel_host. So we would need to change _set_tunnel() to save the original attribute somewhere, change the other methods to use the saved attribute in favor of the real one, and have connect() restore it (so that we can reconnect). This still would not allow pip to reconnect (because it overwrites the connect method), but then reconnecting a tunneled connection with pip did not work before either. Still, rather ugly.

    Alternatively, maybe we could also do nothing, because if pip is depending on undocumented semantics of a private method (_set_tunnel), they have to live with the consequences?

    Thinking about this, I think we should just revert the entire patch for 3.4, but keep it in for 3.5. That gives the pip folks enough time to fix their code. Fixing the issue in 3.4 is probably not that crucial (after all, it existed since about 2.6.3).

    Best, Nikolaus

    -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«
    ned-deily commented 10 years ago

    Should this be a release blocker regression for 3.4.1?

    larryhastings commented 10 years ago

    dstufft, what do you think?

    dstufft commented 10 years ago

    Let me raise the issue with urllib3 and see if maybe we can get a quick turn around and just fix it for real.

    dstufft commented 10 years ago

    This is going to break existing versions of urllib3 (and thus requests and thus pip) when using verified TLS + a proxy, however future versions can work around it and a fix is being looked at right now. Once it's fixed there it can propagate to requests and then to pip.

    Urllib3 issue is here https://github.com/shazow/urllib3/issues/385

    As far as what CPython should do. I personally don't think 3.4.1 should go out with this broken. That'll mean either getting a new pip out with the fix and bump the bundled version in CPython or revert this patch and wait till 3.5 (or 3.4.2 if you don't want to hold up 3.4.1).

    dstufft commented 10 years ago

    Just an update, the issue is fixed in urllib3 and that has been pulled into requests. Requests is currently prepping to release a new version which I'll pull into pip and issue a pip 1.5.6 release which can be pulled into CPython which should fix this.

    orsenthil commented 10 years ago

    I am glad that issues with 3rdparty libs which dependent on the previous wrong behavior has been resolved.

    As indicated previously, I think, it makes sense to have this in 2.7 as well. I created a patch and tested it 2.7 and it is all good. I plan to commit it before the next 2.7 update (which should be tomorrow).

    1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 10 years ago

    New changeset 568041fd8090 by Senthil Kumaran in branch '2.7': Backport Fix for Issue bpo-7776: Fix ``Host:'' header and reconnection when using http.client.HTTPConnection.set_tunnel(). http://hg.python.org/cpython/rev/568041fd8090

    orsenthil commented 10 years ago

    This is fixed in 2.7 as well here (changeset 568041fd8090).

    We shall close this ticket after @dstufft pulls in the updated pip for 3.4

    Thanks!

    dstufft commented 10 years ago

    Requests has been released and I've pulled it into the pip tree. I'll be releasing tonight probably, or maybe tomorrow.

    larryhastings commented 10 years ago

    I tag 3.4.1 final in less than 24 hours. I really would prefer that the embedded pip not contain such, uh, fresh software. But let's try it and hope for the best.

    dstufft commented 10 years ago

    Well you're the RM Larry :) I'll do whatever you think is best. I would greatly prefer it if the pip shipped with CPython 3.4.1 wasn't broken with proxies. I think the choices are

    1) Ship it with the new pip, I can give a delta of the differences if that is helpful. 2) Roll back the patch that broke the behavior 3) Ship with broken pip + proxy behavior

    Whichever you think is the better option is fine with me.

    larryhastings commented 10 years ago

    Yeah, I'd like to see the diff.

    dstufft commented 10 years ago

    I just released pip 1.5.6.

    The ensurepip package currently has 1.5.4 inside of it. 1.5.5 has been out for 2 weeks or so and there haven't been any reported regressions.

    The only difference between 1.5.5 and 1.5.6 is that requests has been upgraded.

    Here's the changelog items for 1.5.5 and 1.5.6: https://github.com/pypa/pip/blob/master/CHANGES.txt#L1-L20

    Here's the diff from 1.5.5 to 1.5.6: https://github.com/pypa/pip/compare/1.5.5...1.5.6 Here's the diff from 1.5.4 to 1.5.6: https://github.com/pypa/pip/compare/1.5.4...1.5.6

    If you want me to update the bundled ensurepip let me know what branch I should do it on. I'm going to go ahead and do it for 3.5 though.

    dstufft commented 10 years ago

    Just FYI, I upgraded setuptools and pip in 3.5:

    http://hg.python.org/cpython/rev/acb5cc616052 http://hg.python.org/cpython/rev/308ff6a5ce67

    If you decide to go that way dunno if you can just cherry pick or not.

    orsenthil commented 10 years ago

    I prefer we update the ensurepip in 3.4.1
    That will be helpful too since 3.5 has the fix.

    dstufft commented 10 years ago

    @larry

    Is there anything else I need to do?

    orsenthil commented 10 years ago

    @dstufft - should you commit it in 3.4 branch (since the change is already in 3.5) and then wait for larry's approval or rejection?

    larryhastings commented 10 years ago

    Okay, this has my blessing to be merged for 3.4.1.