merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
426 stars 145 forks source link

ValueError when using anvi-get-sequences-for-gene-clusters #1050

Closed ShaiberAlon closed 5 years ago

ShaiberAlon commented 5 years ago

I get the following error:

raceback (most recent call last):
  File "/Users/alonshaiber/github/anvio/bin/anvi-get-sequences-for-gene-clusters", line 154, in <module>
    main(args)
  File "/Users/alonshaiber/github/anvio/bin/anvi-get-sequences-for-gene-clusters", line 88, in main
    filtered_gene_clusters_dict = pan.filter_gene_clusters_dict(args)
  File "/Users/alonshaiber/github/anvio/anvio/dbops.py", line 1759, in filter_gene_clusters_dict
    max_geometric_homogeneity_index)
  File "/Users/alonshaiber/github/anvio/anvio/dbops.py", line 1558, in filter_gene_clusters_from_gene_clusters_dict
    homogeneity_keys, homogeneity_dict = TableForItemAdditionalData(self.args).get(['functional_homogeneity_index', 'geometric_homogeneity_index'])
  File "/Users/alonshaiber/github/anvio/anvio/tables/miscdata.py", line 624, in get
    d[additional_data_item_name][key] = eval(entry['data_type'])(value or self.nulls_per_type[entry['data_type']])
ValueError: invalid literal for int() with base 10: '1.0'

I tried using the fix suggested here: https://github.com/merenlab/anvio/issues/1007#issuecomment-429600866

And it didn't work.

To reproduce this, you can do the following easy steps:

  1. checkout the anvio branch metapan-workflow.
  2. go to your-copy-of-the-anvio-repository/anvio/tests/
  3. run: bash run_pangenomics_workflow_tests.sh sandbox/test-output

And you will get this error.

After running this you can go to sandbox/test-output/workflow_test/03_PAN_FIVE_PAN/, and there you will find the pan database and the genomes storage.

ShaiberAlon commented 5 years ago

@meren, here is the problem:

$ sqlite3 03_PAN_FIVE_PAN/FIVE_TEST-PAN.db "select * from item_additional_data" | tail
122|GC_00000016|functional_homogeneity_index|1.0|int|default
123|GC_00000016|geometric_homogeneity_index|1.0|int|default
124|GC_00000022|functional_homogeneity_index|1.0|int|default
125|GC_00000022|geometric_homogeneity_index|1.0|int|default
126|GC_00000006|functional_homogeneity_index|1.0|int|default
127|GC_00000006|geometric_homogeneity_index|1.0|int|default
128|GC_00000007|functional_homogeneity_index|1.0|int|default
129|GC_00000007|geometric_homogeneity_index|1.0|int|default
130|GC_00000013|functional_homogeneity_index|1.0|int|default
131|GC_00000013|geometric_homogeneity_index|1.0|int|default
ShaiberAlon commented 5 years ago

This fixes it so that I can do anvi-display-pan

sqlite3 03_PAN_FIVE_PAN/FIVE_TEST-PAN.db "UPDATE item_additional_data SET data_type='float' WHERE data_key LIKE 'geometric_homogeneity_index'"

sqlite3 03_PAN_FIVE_PAN/FIVE_TEST-PAN.db "UPDATE item_additional_data SET data_type='float' WHERE data_key LIKE 'functional_homogeneity_index'"
meren commented 5 years ago

Oh, yes. This is the fix. But it is concerning we still need it :( I thought @ozcan had fixed it. Are you sure this is a pangenome generate from current or future master?

ShaiberAlon commented 5 years ago

I am sure. but @ozcan fixed a similar bug for layer_additional_data, and here it is item_additional_data.

ShaiberAlon commented 5 years ago

layers and items are different things @meren.

ShaiberAlon commented 5 years ago

Maybe that's why another fix is in order.

meren commented 5 years ago

Aha. Sorry, I missed that.

I hoped the way it was fixed would have worked against all of these issues regardless of the table :)

meren commented 5 years ago

I assume this is fixed now.

ShaiberAlon commented 5 years ago

I tested now and indeed it is fixed.