qiime-deploy behind proxy

phuseman commented 10 years ago

In my work environment I have to use a proxy. This causes some problems downloading software with qiime-deploy. The environment variables ($http_proxy, $https_proxy, $ftp_proxy,. $all_proxy) are all set appropriately but the download_file() function in lib/util.py fails for some downloads. The following tools are not deployed:

drisee, ea-utils, tornado, pyzmq, setuptools, MySQL-python, pyqi, sphinx, biom-format, emperor, pynast, tax2tree, qiime, qiime-galaxy, galaxy

(It might have something to do with the https:// links?!)

I could replace the Python urllib downloading with a nasty hack to use wget instead but maybe one of you could investigate this and probably fix this.

Another thing: Behind the proxy, I have problems to access git://github... urls. This can be circumvented by using https://github... instead. Quite automatically this will be done with the following git config:

git config --global url.https://github.com/.insteadOf git://github.com/

However, it would be easier for people if you would put the https urls of github directly in the accroding qiime-deploy-conf files.

Best, Peter

phuseman commented 10 years ago

Short follow up: It seems that the retrieve() function in urllib.URLopener() is not able to download https:// links via proxy:

Python 2.7.5+ (default, Sep 19 2013, 13:48:49) 
[GCC 4.8.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib
>>> urllib.getproxies()
{'ftp': 'http://proxy:3128', 'all': 'http://proxy:3128', 'http': 'http://proxy:3128', 'https': 'http://proxy:3128'}
# Proxy is set
>>> test = urllib.URLopener()
>>> test.retrieve("http://www.google.com", "test.html")
('test.html', <httplib.HTTPMessage instance at 0x7f22ed118878>)
# ^^^ normal links do work
>>> test.retrieve("https://www.google.de", "test_https.html")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/urllib.py", line 240, in retrieve
    fp = self.open(url, data)
  File "/usr/lib/python2.7/urllib.py", line 208, in open
    return getattr(self, name)(url)
  File "/usr/lib/python2.7/urllib.py", line 359, in open_http
    return self.http_error(url, fp, errcode, errmsg, headers)
  File "/usr/lib/python2.7/urllib.py", line 376, in http_error
    return self.http_error_default(url, fp, errcode, errmsg, headers)
  File "/usr/lib/python2.7/urllib.py", line 381, in http_error_default
    raise IOError, ('http error', errcode, errmsg, headers)
IOError: ('http error', 501, 'Not Implemented', <httplib.HTTPMessage instance at 0x7f22eae8bfc8>)
# ^^^ https links are not implemented

antgonza commented 10 years ago

In the past when I have encountered this problem, I will download the failing packages to my computer, then start an http server (a couple of clicks in Mac), change the config file to look for those packages in my computer and deploy; not pretty but a solution. Anyway, agree that there should be a better way to handle this ...

phuseman commented 10 years ago

Here is the nasty hack to use wget instead (though this might not be that helpful for mac users)

diff --git a/lib/util.py b/lib/util.py
index 18f515f..9bff11a 100644
--- a/lib/util.py
+++ b/lib/util.py
@@ -462,8 +462,10 @@ def download_file(URL, dest_dir, local_file, num_retries = 4):
     rc = 1
     while download_failed > 0:
         try:
-            tmpLocalFP, headers = url_opener.retrieve(URL, \
-                                                      tmpDownloadFP)
+#            tmpLocalFP, headers = url_opener.retrieve(URL, \
+#                                                      tmpDownloadFP)
+            downlStr = 'wget %s -O %s' % (URL, tmpDownloadFP)
+            (downlStatus, downlOut) = commands.getstatusoutput(downlStr)
             os.rename(tmpDownloadFP, localFP)
             rc = 0
         except IOError, msg:

I am not confident enough in python but might it be possible to use urllib2 or the requests package? For example like sketched here: http://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python

Best, Peter

phuseman commented 10 years ago

I found out that it works when using the plain FancyURLopener class. I provided a fix, see my pull request.

Best, Peter

phuseman commented 10 years ago

It seems that my fix did not solve the problem, sorry. Maybe a solution using urllib2 or the requests package works better.

Peter

phuseman commented 10 years ago

Ok, I did some more investigation. Downloading with urllib tries to send the following:

GET https://google.com HTTP/1.0
User-Agent: Python-urllib/1.17

The proxy answers:

HTTP/1.0 501 Not Implemented
Server: squid/2.7.STABLE7

Downloading with wget, however sends:

CONNECT google.com:443 HTTP/1.1
User-Agent: Wget/1.14 (linux-gnu)

thus establishing a proper https connection:

HTTP/1.0 200 Connection established

I found a solution involving urllib2 that seems to work better. After testing I will generate a pull request with the fix.

Best, Peter

qiime / qiime-deploy

qiime-deploy behind proxy #57