ryanmelnyk / PyParanoid

Rapid and scalable homolog identification for bacterial genomes
MIT License
32 stars 7 forks source link

error using ipynb to build genomes #4

Closed spleonard1 closed 6 years ago

spleonard1 commented 6 years ago

Trying to follow the Genome db instructions and the initial call to "download_Ensembl_files" is failing with the following error.

>>> gdb.download_Ensembl_files("../snod_genomedb", maxgen=None, names="snodgrassella", complete=False)
Current release of EnsemblBacteria: release-39
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/pyparanoid/genomedb.py", line 72, in download_Ensembl_files
    thisline.append(js[feat])
KeyError: 'assembly_level'

Using python2.7. Any ideas?

ryanmelnyk commented 6 years ago

Yeah - unfortunately Ensembl keeps changing the structure of their FTP directories....I fixed it for release-36 but that apparently they've made another change. I get the same error too. I'm about to get on a flight so I won't have time to fix this right now.

Can you give download_Refseq_files() a shot? As of last week, that was still working for me. It should have even more genomes than Ensembl anyways.

spleonard1 commented 6 years ago

Yea, thanks! download_Refseq_files() worked great to download the peptide .faa files as .pep.fa

However, it doesn't download the corresponding dna/ or gbk/ files. Whats the best strategy for that?

ryanmelnyk commented 6 years ago

Nice! Dna and gbk files aren't downloaded by default (for storage reasons) and you don't need them to run PyParanoid. But they are very helpful for the downstream stuff. Check out:

gdb.download_dna_files(strains,genomedb)

and

gdb.download_genbank_files(strains,genomedb)

genomedb is the path to your folder containing the dna/ gbk/ and pep/ folders. "strains" is a python list object containing the strain names of interest (see the first couple cells of the CompareStrains notebook).

spleonard1 commented 6 years ago

got it, very helpful. Thank you!

ryanmelnyk commented 6 years ago

I fixed download_Ensembl_files() to play nicely with the new Ensembl FTP structure so I will close this issue for now.