merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
413 stars 142 forks source link

[BUG] anvi-run-kegg-kofams terminated prematurely #2256

Open lulux719 opened 2 months ago

lulux719 commented 2 months ago

Short description of the problem

anvi-run-kegg-kofams terminated prematurely after producing the hmm.table file

anvi'o version

Anvi'o .......................................: marie (v8) Python .......................................: 3.10.13 Profile database .............................: 38 Contigs database .............................: 21 Pan database .................................: 16 Genome data storage ..........................: 7 Auxiliary data storage .......................: 2 Structure database ...........................: 2 Metabolic modules database ...................: 4 tRNA-seq database ............................: 2

System info

Installed via conda

Detailed description of the issue

I have tried several times, and anvi-run-kegg-kofams always terminated prematurely after producing the hmm.table file with the below message.

Done with KOfam 🎊

Number of raw hits in table file .............: 397,424,337 Terminated

Here's the hmm.table generated.

head hmm.table 11548766 - K24524 - 0.0034 19.5 0.0 0.0055 18.8 0.0 1.3 1 0 0 1 1 1 1 - 11690881 - K15921 - 4.7e-199 667.4 13.8 5.2e-199 667.3 13.8 1.0 1 0 0 1 1 1 1 - 11605040 - K15921 - 1.8e-190 639.0 21.8 2.4e-190 638.6 21.8 1.1 1 0 0 1 1 1 1 - 11656118 - K15921 - 1.8e-190 639.0 21.8 2.4e-190 638.6 21.8 1.1 1 0 0 1 1 1 1 -

Here's the info of contigs.db. anvi-db-info 03_CONTIGS/contigs.db

DB Info (no touch)

Database Path ................................: 03_CONTIGS/contigs.db description ..................................: [Not found, but it's OK] db_type ......................................: contigs (variant: unknown) version ......................................: 21

DB Info (no touch also)

project_name .................................: ob contigs_db_hash ..............................: hashc9e5c18c split_length .................................: 20000 kmer_size ....................................: 4 num_contigs ..................................: 1193879 total_length .................................: 16702353431 num_splits ...................................: 1450761 genes_are_called .............................: 1 external_gene_calls ..........................: 0 external_gene_amino_acid_seqs ................: 0 skip_predict_frame ...........................: 0 splits_consider_gene_calls ...................: 1 scg_taxonomy_was_run .........................: 0 scg_taxonomy_database_version ................: None trna_taxonomy_was_run ........................: 0 trna_taxonomy_database_version ...............: None creation_date ................................: 1706651817.91107 gene_function_sources ........................: Pfam gene_level_taxonomy_source ...................: kaiju

AVAILABLE GENE CALLERS

AVAILABLE FUNCTIONAL ANNOTATION SOURCES

AVAILABLE HMM SOURCES

When I try it on a smaller contigs.db (1/4 of the samples), it completed without any problem. So I'm guessing there's something with server capacity. My question would be, are there any ways to bypass this issue? I assume the program finished the "Run an HMM search against KOfam" step. Is it possible to resume the program from here?

Thank you very much.

meren commented 2 months ago

This looks like a memory issue, so there is not much we can do. BUT, there is always a way. In this case, I one could split their contigs-db file into 10 different ones using a collection-txt and anvi-split, then run anvi-run-kegg-kofams on each one of them separately, and then export the contents of gene_functions table from each one of them, and then manually import the final hits into the original contigs-db.

But this is a hacker's workaround, and a machine with a larger memory would have been the most optimal solution of course :)