merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
426 stars 145 forks source link

anvi-gen-genomes-storage gives KeyError: 'kmer_size' #1508

Closed minna-miha closed 3 years ago

minna-miha commented 3 years ago

Hello,

I am using anvio v5.2, and when I ran anvi-self-test --suite mini and anvi-self-test --suite pangenomics, there were no errors, as far as I can tell. I am trying to follow the anvio pangenomics workflow.

anvi-self-test -v gives

Anvi'o version ...............................: margaret (v5.2)
Profile DB version ...........................: 30
Contigs DB version ...........................: 12
Pan DB version ...............................: 12
Genome data storage version ..................: 6
Auxiliary data storage version ...............: 2
Structure DB version .........................: 1

The issue:

I sent anvi-gen-genomes-storage -e anvio_storage.txt -o AHess_Mariana-GENOMES.db and got the error:

Traceback (most recent call last):
  File "/usr/local/bin/anvi-gen-genomes-storage", line 55, in <module>
    genome_descriptions.load_genomes_descriptions()
  File "/usr/local/Cellar/anvio/5.2/libexec/lib/python3.7/site-packages/anvio/genomedescriptions.py", line 185, in load_genomes_descriptions
    self.genomes[genome_name]['genome_hash'] = self.get_genome_hash_for_external_genome(self.genomes[genome_name])
  File "/usr/local/Cellar/anvio/5.2/libexec/lib/python3.7/site-packages/anvio/genomedescriptions.py", line 314, in get_genome_hash_for_external_genome
    contigs_db = dbops.ContigsDatabase(entry['contigs_db_path'])
  File "/usr/local/Cellar/anvio/5.2/libexec/lib/python3.7/site-packages/anvio/dbops.py", line 2754, in __init__
    self.init()
  File "/usr/local/Cellar/anvio/5.2/libexec/lib/python3.7/site-packages/anvio/dbops.py", line 2765, in init
    self.meta[key] = int(self.meta[key])
KeyError: 'kmer_size'

I saw on another very similar (closed) issue that someone responded with the suggestion of using sqlite3 CONTIGS.db "SELECT * FROM self" to see whether kmer_size is an output -- it is not for me.

I also saw the suggestion to run:

for i in `grep -v contigs_db_path anvio_storage.txt | awk '{print $2}'`
do
    echo $i
    python -c "import anvio.dbops as d; d.ContigsDatabase(\"$i\")"
done

But I get the error:

Illium_freeliving_Mag60_contigs.db
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'anvio'
HafaAdai_freeliving_Mag10_contigs.db
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'anvio'
Illium_AHess_Mar13_contigs.db
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'anvio'
HafaAdai_AHess_Mar172_contigs.db
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'anvio'

my --external-genes/storage file looks like:

name    contigs_db_path
Mag60   Illium_freeliving_Mag60_contigs.db
Mag10   HafaAdai_freeliving_Mag10_contigs.db
Illium  Illium_AHess_Mar13_contigs.db
HafaAdai    HafaAdai_AHess_Mar172_contigs.db

Any advice on what to do? Thanks.

meren commented 3 years ago

Your contigs db seems be broken.

Please don't use anvi'o 5.2. it is ancient.

minna-miha commented 3 years ago

@meren yes it seems to be so. When I ran anvi-gen-contigs-database -f HafaAdai_AHess_Mar172_assembly.fasta -o HafaAdai_AHess_contigs.db -n HafaAdai_AHess --external-gene-calls HafaAdai_AHess_Mar172_formatted.txt I did get an error:

Config Error: There are more fields in the file '%s' than the expected fields :/ Anvi'o is   
              telling you about this because get_TAB_delimited_file_as_dictionary funciton is
              called with `only_expected_fields` flag turned on.             

But I was having a hard time finding more about this error online, and interpreted it more as a warning, which maybe I should not have. I'm unclear about why it's throwing this error, what to do about it, and whether or not it's what's causing issues in the following step. HafaAdai_AHess_Mar172_formatted.txt

Thanks for your help!

meren commented 3 years ago

Please remove the aa_sequence column since this is likely a feature we introduced recently and is not available to anvi'o v5.2.

minna-miha commented 3 years ago

@meren thank you!!