merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
423 stars 144 forks source link

anvi-run-scg-taxonomy error #2115

Closed schmittel closed 11 months ago

schmittel commented 11 months ago

Hi,

After running anvi-run-scg-taxonomy I'm getting the following error for each of my contigs databases:

Config Error: Bad news, Houston :/ The contigs database '/anvio/contigs_db_kegg/arch0001-CONTIGS.db' is missing one or more HMM sources that you wished it didn't: 'Bacteria_71'.

The command I used was:

for file in /anvio/contigs_db_kegg/*.db; \
do \
    anvi-run-scg-taxonomy \
        --contigs-db $file \
        --scgs-taxonomy-data-dir /anvio/scg_data \
        --num-parallel-processes 5 \
        --num-threads 50; \
done;

I had previously run anvi-run-hmms without error, using the default HMM libraries (i.e. all of them). I'm guessing the problem is because my contig databases are all archaeal, which is why the Bacteria_71 HMM is not being found. But surely GTDb can classify archaeal sequences?

I'm using the latest development version of Anvio.

Any help would be appreciated. Many thanks.

meren commented 11 months ago

Hi @schmittel, if anvi'o says Bacteria_71 is missing, then it must be missing :) Those models are run on contigs-dbs regardless of the evolutionary origin of genomes. Can you please send the output for anvi-db-info run on /anvio/contigs_db_kegg/arch0001-CONTIGS.db?

schmittel commented 11 months ago

Sure, here's the output:

DB Info (no touch)
===============================================
Database Path ................................: /anvio/contigs_db_kegg/arch0001-CONTIGS.db
description ..................................: [Not found, but it's OK]
db_type ......................................: contigs (variant: unknown)
version ......................................: 20

DB Info (no touch also)
===============================================
project_name .................................: arch0001
contigs_db_hash ..............................: hash32966ace
split_length .................................: 20000
kmer_size ....................................: 4
num_contigs ..................................: 171
total_length .................................: 2376348
num_splits ...................................: 192
gene_level_taxonomy_source ...................: None
genes_are_called .............................: 1
external_gene_calls ..........................: 0
external_gene_amino_acid_seqs ................: 0
skip_predict_frame ...........................: 0
splits_consider_gene_calls ...................: 1
scg_taxonomy_was_run .........................: 0
scg_taxonomy_database_version ................: None
trna_taxonomy_was_run ........................: 0
trna_taxonomy_database_version ...............: None
creation_date ................................: 1692961794.15059
modules_db_hash ..............................: d20a0dcd2128
gene_function_sources ........................: KOfam,Protista_83,Ribosomal_RNA_5S,KEGG_BRITE,Ribosomal_RNA_28S,KEGG_Class,COG20_PATHWAY,Archaea_76,Ribosomal_RNA_12S,Ribosomal_RNA_16S,COG20_CATEGORY,Ribosomal_RNA_18S,Bacteria_71,Transfer_RNAs,COG20_FUNCTION,Ribosomal_RNA_23S,Pfam,KEGG_Module

* Please remember that it is never a good idea to change these values. But in some
  cases it may be absolutely necessary to update something here, and a
  programmer may ask you to run this program and do it. But even then, you
  should be extremely careful.

AVAILABLE GENE CALLERS
===============================================
* 'prodigal' (2,812 gene calls)
* 'Transfer_RNAs' (46 gene calls)

AVAILABLE FUNCTIONAL ANNOTATION SOURCES
===============================================
* Archaea_76 (78 annotations)
* Bacteria_71 (40 annotations)
* COG20_CATEGORY (1,723 annotations)
* COG20_FUNCTION (1,723 annotations)
* COG20_PATHWAY (480 annotations)
* KEGG_BRITE (1,168 annotations)
* KEGG_Class (313 annotations)
* KEGG_Module (313 annotations)
* KOfam (1,296 annotations)
* Pfam (4,067 annotations)
* Protista_83 (23 annotations)
* Ribosomal_RNA_12S (0 annotations)
* Ribosomal_RNA_16S (0 annotations)
* Ribosomal_RNA_18S (0 annotations)
* Ribosomal_RNA_23S (0 annotations)
* Ribosomal_RNA_28S (0 annotations)
* Ribosomal_RNA_5S (0 annotations)
* Transfer_RNAs (46 annotations)

AVAILABLE HMM SOURCES
===============================================
* 'Transfer_RNAs' (61 models with 46 hits)

Thanks!

meren commented 11 months ago

AVAILABLE HMM SOURCES does not list any sources for SCGs. Please re-run anvi-run-hmms.

ivagljiva commented 11 months ago

Since we can see the HMM sources in the functional annotation source list, it seems like you ran HMMs with the flag --add-to-functions-table. So when you re-run HMMs, don't use that flag and the HMMs will go to the right place in the database. :)

schmittel commented 11 months ago

Ahh that's it, thanks. I certainly did use that flag and will run it again without it. Much appreciated.

meren commented 11 months ago

Since we can see the HMM sources in the functional annotation source list,

WELL, speak for yourself with your laser vision, hacker (clearly Meren is unable to see things that matter 😂). Thanks for catching that!