openvax / pyensembl

Python interface to access reference genome features (such as genes, transcripts, and exons) from Ensembl
Apache License 2.0
365 stars 66 forks source link

Unable to download and innstall ensembl data from command line #217

Closed novo1 closed 4 years ago

novo1 commented 5 years ago

I am not able to download and install the ensembl data using the command line

pyensembl install --release 75 76 --species human

The console throws me the following error:

`2018-12-29 19:51:51,440 - pyensembl.shell - INFO - Running 'install' for EnsemblRelease(release=75, species='homo_sapiens') 2018-12-29 19:51:51,440 - pyensembl.download_cache - INFO - Fetching C:\Users\Norbert\AppData\Local\pyensembl\GRCh37\ensembl75\pyensembl\GRCh37\ensembl75\Cache\Homo_sapiens.GRCh37.75.cdna.all.fa.gz from URL ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/cdna\Homo_sapiens.GRCh37.75.cdna.all.fa.gz 2018-12-29 19:51:51,440 - datacache.download - INFO - Downloading ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/cdna\Homo_sapiens.GRCh37.75.cdna.all.fa.gz to C:\Users\Norbert\AppData\Local\pyensembl\GRCh37\ensembl75\pyensembl\GRCh37\ensembl75\Cache\Homo_sapiens.GRCh37.75.cdna.all.fa.gz Traceback (most recent call last): File "c:\users\norbert\anaconda3\lib\urllib\request.py", line 2424, in retrfile self.ftp.cwd(file) File "c:\users\norbert\anaconda3\lib\ftplib.py", line 629, in cwd return self.voidcmd(cmd) File "c:\users\norbert\anaconda3\lib\ftplib.py", line 276, in voidcmd return self.voidresp() File "c:\users\norbert\anaconda3\lib\ftplib.py", line 249, in voidresp resp = self.getresp() File "c:\users\norbert\anaconda3\lib\ftplib.py", line 244, in getresp raise error_perm(resp) ftplib.error_perm: 550 Failed to change directory.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "c:\users\norbert\anaconda3\lib\urllib\request.py", line 1542, in ftp_open fp, retrlen = fw.retrfile(file, type) File "c:\users\norbert\anaconda3\lib\urllib\request.py", line 2426, in retrfile raise URLError('ftp error: %r' % reason) from reason urllib.error.URLError: <urlopen error ftp error: error_perm('550 Failed to change directory.',)>

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "c:\users\norbert\anaconda3\lib\runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "c:\users\norbert\anaconda3\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Users\Norbert\Anaconda3\Scripts\pyensembl.exe__main__.py", line 9, in File "c:\users\norbert\anaconda3\lib\site-packages\pyensembl\shell.py", line 260, in run genome.download(overwrite=args.overwrite) File "c:\users\norbert\anaconda3\lib\site-packages\pyensembl\genome.py", line 271, in download self._set_local_paths(download_if_missing=True, overwrite=overwrite) File "c:\users\norbert\anaconda3\lib\site-packages\pyensembl\genome.py", line 233, in _set_local_paths overwrite=overwrite) File "c:\users\norbert\anaconda3\lib\site-packages\pyensembl\genome.py", line 207, in _get_transcript_fasta_paths for path in self._transcript_fasta_paths_or_urls] File "c:\users\norbert\anaconda3\lib\site-packages\pyensembl\genome.py", line 207, in for path in self._transcript_fasta_paths_or_urls] File "c:\users\norbert\anaconda3\lib\site-packages\pyensembl\genome.py", line 186, in _get_cached_path overwrite=overwrite) File "c:\users\norbert\anaconda3\lib\site-packages\pyensembl\download_cache.py", line 299, in local_path_or_install_error overwrite=overwrite) File "c:\users\norbert\anaconda3\lib\site-packages\pyensembl\download_cache.py", line 274, in download_or_copy_if_necessary overwrite) File "c:\users\norbert\anaconda3\lib\site-packages\pyensembl\download_cache.py", line 222, in _download_if_necessary timeout=3600) File "c:\users\norbert\anaconda3\lib\site-packages\datacache\download.py", line 110, in _download_and_decompress_if_necessary use_wget_if_available=use_wget_if_available) File "c:\users\norbert\anaconda3\lib\site-packages\datacache\download.py", line 68, in _download_to_temp_file download_using_python() File "c:\users\norbert\anaconda3\lib\site-packages\datacache\download.py", line 65, in download_using_python _download(download_url, timeout=timeout)) File "c:\users\norbert\anaconda3\lib\site-packages\datacache\download.py", line 42, in _download response = urllib.request.urlopen(req, data=None, timeout=timeout) File "c:\users\norbert\anaconda3\lib\urllib\request.py", line 223, in urlopen return opener.open(url, data, timeout) File "c:\users\norbert\anaconda3\lib\urllib\request.py", line 526, in open response = self._open(req, data) File "c:\users\norbert\anaconda3\lib\urllib\request.py", line 544, in _open '_open', req) File "c:\users\norbert\anaconda3\lib\urllib\request.py", line 504, in _call_chain result = func(*args) File "c:\users\norbert\anaconda3\lib\urllib\request.py", line 1553, in ftp_open raise exc.with_traceback(sys.exc_info()[2]) File "c:\users\norbert\anaconda3\lib\urllib\request.py", line 1542, in ftp_open fp, retrlen = fw.retrfile(file, type) File "c:\users\norbert\anaconda3\lib\urllib\request.py", line 2426, in retrfile raise URLError('ftp error: %r' % reason) from reason urllib.error.URLError: <urlopen error ftp error: URLError("ftp error: error_perm('550 Failed to change directory.',)",)>`

I'm running python on windows 10 using the Anaconda package.

Does anyone know how to solve this issue? Thank you!!!

scvannost commented 4 years ago

I'm also getting this same double error. Similar set up - Windows 10 using Anaconda for py3.7. Trying to install the mouse genome, however.

randomEPFLexts commented 4 years ago

The path to the source is erroneous - the \ between the path and the file should be /. I assume there is an issue with the path generation.

ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/cdna\Homo_sapiens.GRCh37.75.cdna.all.fa.gz should be ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh37.75.cdna.all.fa.gz

randomEPFLexts commented 4 years ago

The problem, as far as I can tell, lies in the make_fasta_url function in the ensembl_url_templates.py and it's mixed use of urllib_parse.urljoin and os.path.join. This issue is probably limited to operating systems using backslash as path separator (i.e. Windows). Since I do not have a Windows environment to test, you might try to replace the return join(server_sequence_subdir, filename) with return urllib_parse.urljoin(server_sequence_subdir, filename) and report back if this helps. To be thorough, you may also replace server_sequence_subdir = join(server_subdir, sequence_type) with server_sequence_subdir = urllib_parse.urljoin(server_subdir, sequence_type)

scvannost commented 4 years ago

Changed as suggested, which fixes the \ vs /. Same error though.

randomEPFLexts commented 4 years ago

Unfortunately, I still have no access to a Windows10 system to check this further. Does the retrieval of the file work in the anaconda environment when you call the urllib.urlretrieve function directly when using the url and local filename from the error message? i.e.:

import urllib 
urllib.request.urlretrieve('ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh37.75.cdna.all.fa.gz', 
                           'C:\path\to\Homo_sapiens.GRCh37.75.cdna.all.fa.gz')
scvannost commented 4 years ago

So that works, returning the filename and an email.message.Message with .items() = [('Content-length', '60646070')]. The resulting file is 59225KB