openvax / pyensembl

Python interface to access reference genome features (such as genes, transcripts, and exons) from Ensembl
Apache License 2.0
374 stars 65 forks source link

pip install in conda does not allow for download #264

Open da-i opened 2 years ago

da-i commented 2 years ago

Hi Openvax,

I've created a new conda env and installed via pip as suggested by the instructions. If i run pyensembl install --species human I get an error (see below), due to the fact the the requested url is not responsive. (eg its not working with curl or wget either).

I think this is due to the fact that the code refers to ftp://ftp.ensembl.org/pub/release-106/fasta/homo_sapiens/cdna/Homo_sapiens.G RCh38.cdna.all.fa.gz

whilst wget works on: http://ftp.ensembl.org/pub/release-106/fasta/homo_sapiens/cdna/Homo_sapiens.G RCh38.cdna.all.fa.gz.

$ pyensembl install --species human                                                                               
2022-04-20 13:51:09,574 - pyensembl.shell - INFO - Running 'install' for EnsemblRelease(release=106, species='homo_sapiens')                  
2022-04-20 13:51:09,574 - pyensembl.download_cache - INFO - Fetching /home/dami/.cache/pyensembl/GRCh38/ensembl106/Homo_sapiens.GRCh38.106.gtf
.gz from URL ftp://ftp.ensembl.org/pub/release-106/gtf/homo_sapiens/Homo_sapiens.GRCh38.106.gtf.gz                                            
2022-04-20 13:51:09,574 - datacache.download - INFO - Downloading ftp://ftp.ensembl.org/pub/release-106/gtf/homo_sapiens/Homo_sapiens.GRCh38.1
06.gtf.gz to /home/dami/.cache/pyensembl/GRCh38/ensembl106/Homo_sapiens.GRCh38.106.gtf.gz                                                     
Traceback (most recent call last):                                                                                                            
  File "/home/dami/miniconda3/envs/dataanalysis/lib/python3.9/urllib/request.py", line 1563, in ftp_open                                      
    fw = self.connect_ftp(user, passwd, host, port, dirs, req.timeout)                                                                        
  File "/home/dami/miniconda3/envs/dataanalysis/lib/python3.9/urllib/request.py", line 1584, in connect_ftp                                   
    return ftpwrapper(user, passwd, host, port, dirs, timeout,                                                                                
  File "/home/dami/miniconda3/envs/dataanalysis/lib/python3.9/urllib/request.py", line 2405, in __init__                                      
    self.init()                                                                                                                               
  File "/home/dami/miniconda3/envs/dataanalysis/lib/python3.9/urllib/request.py", line 2414, in init                                          
    self.ftp.connect(self.host, self.port, self.timeout)                                                                                      
  File "/home/dami/miniconda3/envs/dataanalysis/lib/python3.9/ftplib.py", line 158, in connect                                                
    self.sock = socket.create_connection((self.host, self.port), self.timeout,                                                                
  File "/home/dami/miniconda3/envs/dataanalysis/lib/python3.9/socket.py", line 844, in create_connection                                      
    raise err                                                                                                                                 
  File "/home/dami/miniconda3/envs/dataanalysis/lib/python3.9/socket.py", line 832, in create_connection                                      
    sock.connect(sa)                                                                                                                          
TimeoutError: [Errno 110] Connection timed out                                                                                                

During handling of the above exception, another exception occurred:                                                                           

Traceback (most recent call last):                                                                                                            
  File "/home/dami/miniconda3/envs/dataanalysis/bin/pyensembl", line 8, in <module>                                                           
    sys.exit(run())                                                    
  File "/home/dami/miniconda3/envs/dataanalysis/lib/python3.9/site-packages/pyensembl/shell.py", line 255, in run
    genome.download(overwrite=args.overwrite)                         
  File "/home/dami/miniconda3/envs/dataanalysis/lib/python3.9/site-packages/pyensembl/genome.py", line 266, in download
    self._set_local_paths(download_if_missing=True, overwrite=overwrite)
  File "/home/dami/miniconda3/envs/dataanalysis/lib/python3.9/site-packages/pyensembl/genome.py", line 222, in _set_local_paths
    self.gtf_path = self._get_gtf_path(
  File "/home/dami/miniconda3/envs/dataanalysis/lib/python3.9/site-packages/pyensembl/genome.py", line 184, in _get_gtf_path
    return self._get_cached_path(                                      
  File "/home/dami/miniconda3/envs/dataanalysis/lib/python3.9/site-packages/pyensembl/genome.py", line 177, in _get_cached_path
    return self.download_cache.local_path_or_install_error(                                                                                   
  File "/home/dami/miniconda3/envs/dataanalysis/lib/python3.9/site-packages/pyensembl/download_cache.py", line 309, in local_path_or_install_e
rror         
    return self.download_or_copy_if_necessary(                                                                                                
  File "/home/dami/miniconda3/envs/dataanalysis/lib/python3.9/site-packages/pyensembl/download_cache.py", line 285, in download_or_copy_if_nec
essary                                                                                                                                        
    return self._download_if_necessary(                                                                                                       
  File "/home/dami/miniconda3/envs/dataanalysis/lib/python3.9/site-packages/pyensembl/download_cache.py", line 231, in _download_if_necessary
    datacache.download._download_and_decompress_if_necessary(                                                                                 
  File "/home/dami/miniconda3/envs/dataanalysis/lib/python3.9/site-packages/datacache/download.py", line 105, in _download_and_decompress_if_n
ecessary                                                                                                                                      
    tmp_path = _download_to_temp_file(                                                                                                        
  File "/home/dami/miniconda3/envs/dataanalysis/lib/python3.9/site-packages/datacache/download.py", line 68, in _download_to_temp_file
    download_using_python()                                                                                                                   
  File "/home/dami/miniconda3/envs/dataanalysis/lib/python3.9/site-packages/datacache/download.py", line 65, in download_using_python
    _download(download_url, timeout=timeout))                                                                                                 
  File "/home/dami/miniconda3/envs/dataanalysis/lib/python3.9/site-packages/datacache/download.py", line 42, in _download
    response = urllib.request.urlopen(req, data=None, timeout=timeout)
  File "/home/dami/miniconda3/envs/dataanalysis/lib/python3.9/urllib/request.py", line 214, in urlopen
    return opener.open(url, data, timeout)
  File "/home/dami/miniconda3/envs/dataanalysis/lib/python3.9/urllib/request.py", line 517, in open
    response = self._open(req, data)
  File "/home/dami/miniconda3/envs/dataanalysis/lib/python3.9/urllib/request.py", line 534, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/home/dami/miniconda3/envs/dataanalysis/lib/python3.9/urllib/request.py", line 494, in _call_chain
    result = func(*args)
  File "/home/dami/miniconda3/envs/dataanalysis/lib/python3.9/urllib/request.py", line 1581, in ftp_open
    raise exc.with_traceback(sys.exc_info()[2])
  File "/home/dami/miniconda3/envs/dataanalysis/lib/python3.9/urllib/request.py", line 1563, in ftp_open
    fw = self.connect_ftp(user, passwd, host, port, dirs, req.timeout)
  File "/home/dami/miniconda3/envs/dataanalysis/lib/python3.9/urllib/request.py", line 1584, in connect_ftp
    return ftpwrapper(user, passwd, host, port, dirs, timeout,
  File "/home/dami/miniconda3/envs/dataanalysis/lib/python3.9/urllib/request.py", line 2405, in __init__
    self.init()
  File "/home/dami/miniconda3/envs/dataanalysis/lib/python3.9/urllib/request.py", line 2414, in init
    self.ftp.connect(self.host, self.port, self.timeout)
  File "/home/dami/miniconda3/envs/dataanalysis/lib/python3.9/ftplib.py", line 158, in connect
    self.sock = socket.create_connection((self.host, self.port), self.timeout,
  File "/home/dami/miniconda3/envs/dataanalysis/lib/python3.9/socket.py", line 844, in create_connection
    raise err
  File "/home/dami/miniconda3/envs/dataanalysis/lib/python3.9/socket.py", line 832, in create_connection
    sock.connect(sa)
urllib.error.URLError: <urlopen error ftp error: TimeoutError(110, 'Connection timed out')>

Let me know if the issue is not clear enough

da-i commented 2 years ago

Updating to ENSEMBL_FTP_SERVER = "http://ftp.ensembl.org" in pyensembl/ensembl_url_templates.py resolves the issue