Closed spleonard1 closed 6 years ago
Yeah - unfortunately Ensembl keeps changing the structure of their FTP directories....I fixed it for release-36 but that apparently they've made another change. I get the same error too. I'm about to get on a flight so I won't have time to fix this right now.
Can you give download_Refseq_files() a shot? As of last week, that was still working for me. It should have even more genomes than Ensembl anyways.
Yea, thanks! download_Refseq_files() worked great to download the peptide .faa files as .pep.fa
However, it doesn't download the corresponding dna/ or gbk/ files. Whats the best strategy for that?
Nice! Dna and gbk files aren't downloaded by default (for storage reasons) and you don't need them to run PyParanoid. But they are very helpful for the downstream stuff. Check out:
gdb.download_dna_files(strains,genomedb)
and
gdb.download_genbank_files(strains,genomedb)
genomedb is the path to your folder containing the dna/ gbk/ and pep/ folders. "strains" is a python list object containing the strain names of interest (see the first couple cells of the CompareStrains notebook).
got it, very helpful. Thank you!
I fixed download_Ensembl_files() to play nicely with the new Ensembl FTP structure so I will close this issue for now.
Trying to follow the Genome db instructions and the initial call to "download_Ensembl_files" is failing with the following error.
Using python2.7. Any ideas?