jech / polipo

The Polipo caching HTTP proxy
http://www.pps.jussieu.fr/~jch/software/polipo/
MIT License
1.8k stars 354 forks source link

Error with https and Python's urllib.urlretrieve: 400 Couldn't parse URL #56

Closed blueyed closed 9 years ago

blueyed commented 9 years ago

I've noticed that Polipo fails to parse the URL from Python's urllib.urlretrieve (Python 2.7.9):

% python2 -c 'import urllib; r = urllib.urlretrieve("https://www.example.com/"); print(file(r[0]).read())'<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html><head>
<title>Proxy error: 400 Couldn't parse URL.</title>
</head><body>
<h1>400 Couldn't parse URL</h1>
<p>The following error occurred while trying to access <strong>https://www.example.com/</strong>:<br><br>
<strong>400 Couldn't parse URL</strong></p>
<hr>Generated Tue, 03 Mar 2015 13:11:53 CET by Polipo on <em>localhost:3128</em>.
</body></html>

This appears to happen for any https URLs, and is probably caused by an improper / no use of CONNECT from Python's side?!

The relevant bug for Python is http://bugs.python.org/issue1424152; there's a patch for 2.7, which hasn't been applied yet.

jech commented 9 years ago

Their bug :-)