Closed 92259bf3-4b4c-47c1-a468-08a70c584615 closed 10 years ago
I'm trying to do HTTPS via a proxy in Python 2.6.4 (which is supposed to incorporate this fix from bpo-1424152).
While trying to debug this starting from the suds library I've been reading httplib.py and urllib2.py to figure out what's going wrong and found myself around line 687 of httplib.py at the _tunnel() function.
_tunnel() is broken because _set_hostport() has side effects.
_tunnel() starts with: self._set_hostport(self._tunnel_host, self._tunnel_port) to arrange that the subsequent connection is made to the proxy host and port, and that is in itself ok.
However, _set_hostport() sets the .host and .port attributes in the HTTPConnection object.
The next action _tunnel() takes is to send the CONNECT HTTP command, filling in the endpoint host and port from self.host and self.port. But these values have been overwritten by the preceeding _set_hostport() call, and so we ask the proxy to connect to itself.
It seems to me that _tunnel() should be grabbing the original host and port before calling _set_hostport(), thus:
ohost, oport = self.host, self.port self._set_hostport(self._tunnel_host, self._tunnel_port) self.send("CONNECT %s:%d HTTP/1.0\r\n\r\n" % (ohost, oport))
In fact the situation seems even worse: _tunnel() calls send(), send() calls connect(), and connect() calls _tunnel() in an infinite regress.
Amendment: regarding the infinite regress, it looks like there will not be a recursion if the caller leaps straight to the .connect() method. However, if they do that then the call to _tunnel() from within connect() will happen _after_ the socket is made directly to the origin host, not via the proxy. So the behaviour seems incorrect then also; it looks very much like _tunnel() must always be called before the real socket connection is established, and .connect() calls _tunnel() afterwards, not before.
It's looking like I have my idea of .host versus ._tunnel_host swapped. I think things are still buggy, but my interpretation of the bug is wrong or misleading.
I gather that after _set_tunnel(), .host is the proxy host and that ._tunnel_host is the original target host.
I'll follow up here in a bit when I've better characterised the problem. I think I'm letting urllib2's complicated state stuff confuse me too...
Well, I've established a few things:
To the first item: _tunnel() feels really fragile with that recursion issue, though it doesn't recurse called from urllib2.
For the second, here's my test script using httplib:
H = httplib.HTTPSConnection("localhost", 3128)
print H
H._set_tunnel("localhost", 443)
H.request("GET", "/boguspath")
os.system("lsof -p %d | grep IPv4" % (os.getpid(),))
R = H.getresponse()
print R.status, R.reason
As you can see, one builds the HTTPSConnection object with the proxy's details instead of those of the target URL, and then put the target URL details in with _set_tunnel(). Am I alone in find this strange?
For the third, my test code is this:
U = urllib2.Request('https://localhost/boguspath')
U.set_proxy('localhost:3128', 'https')
f = urllib2.urlopen(R)
print f.read()
which fails like this:
Traceback (most recent call last):
File "thttp.py", line 15, in <module>
f = urllib2.urlopen(R)
File "/opt/python-2.6.4/lib/python2.6/urllib2.py", line 131, in urlopen
return _opener.open(url, data, timeout)
File "/opt/python-2.6.4/lib/python2.6/urllib2.py", line 395, in open
protocol = req.get_type()
AttributeError: HTTPResponse instance has no attribute 'get_type'
The line numbers are slightly off because I've got some debugging statements in there.
Finally, I flat out do not understand urllib2's set_proxy() method:
def set_proxy(self, host, type):
if self.type == 'https' and not self.\_tunnel_host:
self.\_tunnel_host = self.host
else:
self.type = type
self.\_\_r_host = self.\_\_original
self.host = host
When my code calls set_proxy, self.type is None. Now, I had naively expected the first branch to be the only branch. Could someone explain what's happening here, and what is meant to happen?
I'm thinking that this bug may turn into a doc fix instead of a behaviour fix, but I'm finding it surprisingly hard to know how urllib2 is supposed to be used.
As you noticed, the _set_tunnel method is a private method not intended to be used directly. Its being used by urllib2 when https through proxy is required. urllib2 works like this, it reads HTTPS_PROXY environment variable (in turn includes HTTPSProxyHandler and HTTPSProxyAuthenticationHandler) and then try to do a urlopen on an https:// url or a request object through the tunnel doing a CONNECT instead of a GET.
How do think the docs can be improved? If you have any suggestions please upload a patch. Thanks.
Well, following your description I've backed out my urllib2 test case to this:
f = urllib2.urlopen('https://localhost/boguspath')
os.system("lsof -p %d | grep IPv4" % (os.getpid(),))
f = urllib2.urlopen(R)
print f.read()
and it happily runs HTTPS through the proxy if I set the https_proxy envvar. So it's all well and good for the "just do what the environment suggests" use case.
However, my older test:
U = urllib2.Request('https://localhost/boguspath')
U.set_proxy('localhost:3128', 'https')
f = urllib2.urlopen(R)
print f.read()
still blows up with:
File "/opt/python-2.6.4/lib/python2.6/urllib2.py", line 381, in open protocol = req.get_type() AttributeError: HTTPResponse instance has no attribute 'get_type'
Now, this is the use case for "I have a custom proxy setup for this activity".
It seems a little dd that "req" above is an HTTPResponse instead of a Request, and that my be why there's no .ettype() method available.
I also see nothing obviously wrong with my set_proxy() call above based on the docs for the .set_proxy() method, though obviously it fails.
I think what may be needed is a small expansion of the section in the Examples are on proxies. There's an description of the use of the *_proxy envvars there (and not elsewhere, which seems wrong) and an example of providing a proxy Handler. An addition example with a functioning use of a bare .set_proxy() might help.
Cameron could you provide a patch for this?
Ok, this is a bit of a mess.
Issue bpo-11448 contains a patch for the documentation. But the functionality itself is still not working.
With the (I believe) intended usage, we have:
>> import http.client >> conn = http.client.HTTPSConnection("localhost",8080) >> conn.set_tunnel("www.python.org") >> conn.request("HEAD","/index.html")
What happens then is this:
I have attached a patch that should fix the issue.
New patch revision, this time includes unit tests.
I've updated the patch again to fix a problem with HTTPSConnection.
Rebased patch on current tip.
A few comments
Thanks for the review! I've attached an updated patch.
An update to the set_tunnel library documentation is being discussed in bpo-11448. I'm not sure how to best handle the overlap. Maybe the best way is to first deal with bpo-11448, and then add the changes resulting from this issue?
Hmm. I think I found another problem... please wait for another patch revision.
Ok, I've attached yet another patch revision.
This revision is less complex, because it gets rid of the ability to set up chains of tunnels. The only reason that I put that in was to preserve backward compatibility -- but upon reviewing the old implementation again, I found out that this actually did not work in the past either.
Refreshed patch.
I am reviewing this patch right now and you will see my action soon. It is completely and I am reviewing to validating the technical details/fix. Thanks for patch, Nikolaus.
I verified the patch and this indeed corrects a nasty bug in sending a wrong header when doing it a lower level HTTPSConnection to proxy and set_tunnel (bad term) to the end host..I was worried as why we did not observe this earlier and it seems to me that the advertised way to do HTTPS CONNECT is via Proxy Handler or urllib.request and when doing it via a ProxyHandler, these wierdly named action (set_tunnel) happen underneath, but the skip_hosts bit is set as we got headers from the higher level method. and the host header is carried transparently to the tunnel connection request and thus we escaped this.
The patch fixes the problem and cleans up a bit. Thanks for that , Nikolaus.
This code (http/client.py) will require more attention beyond this bug too.
New changeset 39ee3286d187 by Senthil Kumaran in branch '3.4': Issue bpo-7776: Fix ``Host:'' header and reconnection when using http.client.HTTPConnection.set_tunnel(). http://hg.python.org/cpython/rev/39ee3286d187
New changeset 2c9af09ba7b8 by Senthil Kumaran in branch 'default': merge from 3.4 http://hg.python.org/cpython/rev/2c9af09ba7b8
This is fixed in 3.4 and 3.5 (finally). I will port it to 2.7 as well.
I get errors when using pip with a proxy in 3.4.1rc1 on Windows, that does not happen on 3.4.0. I tracked it down to this change to client.py. OK with client.py from fd2c69cedb25, but not with client.py from 39ee3286d187.
C:\Python34\Scripts>set HTTP_PROXY=http://openwrt.lan:8888
C:\Python34\Scripts>set HTTPS_PROXY=http://openwrt.lan:8888
C:\Python34\Scripts>pip -v install simplejson Downloading/unpacking simplejson Could not fetch URL https://pypi.python.org/simple/simplejson/: connection err or: hostname 'openwrt.lan' doesn't match either of '*.c.ssl.fastly.net', 'c.ssl. fastly.net', '*.target.com', '*.vhx.tv', '*.snappytv.com', '*.atlassian.net', 'p laces.hoteltonight.com', 'secure.lessthan3.com', '*.atlassian.com', 'a.sellpoint .net', 'cdn.upthere.com', '*.tissuu.com', '*.issuu.com', '*.kekofan.com', '*.pyt hon.org', '*.theverge.com', '*.sbnation.com', '*.polygon.com', '*.twobrightlight s.com', '*.2brightlights.info', '*.vox.com', 'staging-cdn.upthere.com', '*.zeebo x.com', '*.beamly.com', '*.aticpan.org', 'stream.svc.7digital.net', 'stream-test .svc.7digital.net', '*.articulate.com', 's.t.st', 'vid.thestreet.com', '*.planet -labs.com', '*.url2png.com', 'turn.com', 'www.turn.com', 'rivergathering.org', ' social.icfglobal2014-europe.org', '*.innogamescdn.com', '*.pathable.com', '*.sta ging.pathable.com', '*.kickstarter.com', 'sparkingchange.org', 'www.swedavia.se' , 'www.swedavia.com', 'js-agent.newrelic.com', '*.fastly-streams.com', 'cdn.bran disty.com', 'fastly.hightailcdn.com', '*.fl.yelpcdn.com', '*.feedmagnet.com', 'a pi.contentbody.com', '*.acquia.com', '*.swarmapp.com', '*.lonny.com', '*.stylebi stro.com', '*.zimbio.com', '*.pypa.io', 'pypa.io', 'static.qbranch.se', '*.krxd. net', '*.room.co', '*.metrological.com', 'room.co', 'www.ibmserviceengage.com', 'my.ibmserviceengage.com', 'cdn.evbuc.com', 'cdn.adagility.com' Will skip URL https://pypi.python.org/simple/simplejson/ when looking for down load links for simplejson
On 05/09/2014 02:02 PM, Cybjit wrote:
C:\Python34\Scripts>pip -v install simplejson Downloading/unpacking simplejson Could not fetch URL https://pypi.python.org/simple/simplejson/: connection err or: hostname 'openwrt.lan' doesn't match either of '*.c.ssl.fastly.net', 'c.ssl.
This looks as if pip tries to match the hostname in the certificate from pypi.python.org against the hostname of the local proxy. Looking at the code, I don't see why it would do that though. HTTPSConnection.connect definitely tries to match against the final hostname.
Is pip maybe doing its own certificate check, and relying on HTTPSConnection.host to contain the final hostname rather than the proxy?
On 2014-05-10 00:23, nikratio wrote:
Is pip maybe doing its own certificate check, and relying on HTTPSConnection.host to contain the final hostname rather than the proxy?
I think the culprit might be here https://github.com/pypa/pip/blob/1.5.4/pip/_vendor/requests/packages/urllib3/connection.py#L172
Cybjit \report@bugs.python.org\ writes:
Cybjit added the comment:
On 2014-05-10 00:23, nikratio wrote: > Is pip maybe doing its own certificate check, and relying on > HTTPSConnection.host to contain the final hostname rather than the proxy?
I think the culprit might be here https://github.com/pypa/pip/blob/1.5.4/pip/_vendor/requests/packages/urllib3/connection.py#L172
Yes, that's the problem. I guess that nicely demonstrates why using inheritance as an API is not a good idea.
I guess we nevertheless need to repair/work around this in Python 3.4? Unfortunately pip explicitly relies on _set_tunnel() to set self.host = self._tunnel_host. So we would need to change _set_tunnel() to save the original attribute somewhere, change the other methods to use the saved attribute in favor of the real one, and have connect() restore it (so that we can reconnect). This still would not allow pip to reconnect (because it overwrites the connect method), but then reconnecting a tunneled connection with pip did not work before either. Still, rather ugly.
Alternatively, maybe we could also do nothing, because if pip is depending on undocumented semantics of a private method (_set_tunnel), they have to live with the consequences?
Thinking about this, I think we should just revert the entire patch for 3.4, but keep it in for 3.5. That gives the pip folks enough time to fix their code. Fixing the issue in 3.4 is probably not that crucial (after all, it existed since about 2.6.3).
Best, Nikolaus
-- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F
»Time flies like an arrow, fruit flies like a Banana.«
Should this be a release blocker regression for 3.4.1?
dstufft, what do you think?
Let me raise the issue with urllib3 and see if maybe we can get a quick turn around and just fix it for real.
This is going to break existing versions of urllib3 (and thus requests and thus pip) when using verified TLS + a proxy, however future versions can work around it and a fix is being looked at right now. Once it's fixed there it can propagate to requests and then to pip.
Urllib3 issue is here https://github.com/shazow/urllib3/issues/385
As far as what CPython should do. I personally don't think 3.4.1 should go out with this broken. That'll mean either getting a new pip out with the fix and bump the bundled version in CPython or revert this patch and wait till 3.5 (or 3.4.2 if you don't want to hold up 3.4.1).
Just an update, the issue is fixed in urllib3 and that has been pulled into requests. Requests is currently prepping to release a new version which I'll pull into pip and issue a pip 1.5.6 release which can be pulled into CPython which should fix this.
I am glad that issues with 3rdparty libs which dependent on the previous wrong behavior has been resolved.
As indicated previously, I think, it makes sense to have this in 2.7 as well. I created a patch and tested it 2.7 and it is all good. I plan to commit it before the next 2.7 update (which should be tomorrow).
New changeset 568041fd8090 by Senthil Kumaran in branch '2.7': Backport Fix for Issue bpo-7776: Fix ``Host:'' header and reconnection when using http.client.HTTPConnection.set_tunnel(). http://hg.python.org/cpython/rev/568041fd8090
This is fixed in 2.7 as well here (changeset 568041fd8090).
We shall close this ticket after @dstufft pulls in the updated pip for 3.4
Thanks!
Requests has been released and I've pulled it into the pip tree. I'll be releasing tonight probably, or maybe tomorrow.
I tag 3.4.1 final in less than 24 hours. I really would prefer that the embedded pip not contain such, uh, fresh software. But let's try it and hope for the best.
Well you're the RM Larry :) I'll do whatever you think is best. I would greatly prefer it if the pip shipped with CPython 3.4.1 wasn't broken with proxies. I think the choices are
1) Ship it with the new pip, I can give a delta of the differences if that is helpful. 2) Roll back the patch that broke the behavior 3) Ship with broken pip + proxy behavior
Whichever you think is the better option is fine with me.
Yeah, I'd like to see the diff.
I just released pip 1.5.6.
The ensurepip package currently has 1.5.4 inside of it. 1.5.5 has been out for 2 weeks or so and there haven't been any reported regressions.
The only difference between 1.5.5 and 1.5.6 is that requests has been upgraded.
Here's the changelog items for 1.5.5 and 1.5.6: https://github.com/pypa/pip/blob/master/CHANGES.txt#L1-L20
Here's the diff from 1.5.5 to 1.5.6: https://github.com/pypa/pip/compare/1.5.5...1.5.6 Here's the diff from 1.5.4 to 1.5.6: https://github.com/pypa/pip/compare/1.5.4...1.5.6
If you want me to update the bundled ensurepip let me know what branch I should do it on. I'm going to go ahead and do it for 3.5 though.
Just FYI, I upgraded setuptools and pip in 3.5:
http://hg.python.org/cpython/rev/acb5cc616052 http://hg.python.org/cpython/rev/308ff6a5ce67
If you decide to go that way dunno if you can just cherry pick or not.
I prefer we update the ensurepip in 3.4.1
That will be helpful too since 3.5 has the fix.
@larry
Is there anything else I need to do?
@dstufft - should you commit it in 3.4 branch (since the change is already in 3.5) and then wait for larry's approval or rejection?
Okay, this has my blessing to be merged for 3.4.1.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = 'https://github.com/orsenthil' closed_at =
created_at =
labels = ['type-bug', 'library', 'release-blocker']
title = 'http.client.HTTPConnection tunneling is broken'
updated_at =
user = 'https://github.com/cameron-simpson'
```
bugs.python.org fields:
```python
activity =
actor = 'orsenthil'
assignee = 'orsenthil'
closed = True
closed_date =
closer = 'larry'
components = ['Library (Lib)']
creation =
creator = 'cameron'
dependencies = []
files = ['33357', '33358', '33431', '33520', '33738', '33761', '34802', '34803', '35267']
hgrepos = []
issue_num = 7776
keywords = ['patch']
message_count = 43.0
messages = ['98264', '98266', '98268', '98311', '98312', '98401', '190052', '207658', '207661', '207663', '207976', '208382', '209382', '209397', '209494', '209500', '216031', '216046', '216100', '216114', '216117', '218197', '218204', '218212', '218286', '218294', '218295', '218297', '218299', '218355', '218685', '218688', '218689', '218691', '218692', '218693', '218694', '218695', '218696', '218698', '218724', '218728', '218729']
nosy_count = 12.0
nosy_names = ['barry', 'orsenthil', 'larry', 'benjamin.peterson', 'ned.deily', 'cameron', 'nikratio', 'ethan.furman', 'python-dev', 'martin.panter', 'dstufft', 'Cybjit']
pr_nums = []
priority = 'release blocker'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue7776'
versions = ['Python 2.7', 'Python 3.4', 'Python 3.5']
```