Short description of the problem

anvi-run-kegg-kofams terminated prematurely after producing the hmm.table file

anvi'o version

Anvi'o .......................................: marie (v8) Python .......................................: 3.10.13 Profile database .............................: 38 Contigs database .............................: 21 Pan database .................................: 16 Genome data storage ..........................: 7 Auxiliary data storage .......................: 2 Structure database ...........................: 2 Metabolic modules database ...................: 4 tRNA-seq database ............................: 2

System info

Installed via conda

Detailed description of the issue

I have tried several times, and anvi-run-kegg-kofams always terminated prematurely after producing the hmm.table file with the below message.

Done with KOfam 🎊

Number of raw hits in table file .............: 397,424,337 Terminated

Here's the hmm.table generated.

head hmm.table 11548766 - K24524 - 0.0034 19.5 0.0 0.0055 18.8 0.0 1.3 1 0 0 1 1 1 1 - 11690881 - K15921 - 4.7e-199 667.4 13.8 5.2e-199 667.3 13.8 1.0 1 0 0 1 1 1 1 - 11605040 - K15921 - 1.8e-190 639.0 21.8 2.4e-190 638.6 21.8 1.1 1 0 0 1 1 1 1 - 11656118 - K15921 - 1.8e-190 639.0 21.8 2.4e-190 638.6 21.8 1.1 1 0 0 1 1 1 1 -

Here's the info of contigs.db. anvi-db-info 03_CONTIGS/contigs.db

DB Info (no touch)

Database Path ................................: 03_CONTIGS/contigs.db description ..................................: [Not found, but it's OK] db_type ......................................: contigs (variant: unknown) version ......................................: 21

DB Info (no touch also)

project_name .................................: ob contigs_db_hash ..............................: hashc9e5c18c split_length .................................: 20000 kmer_size ....................................: 4 num_contigs ..................................: 1193879 total_length .................................: 16702353431 num_splits ...................................: 1450761 genes_are_called .............................: 1 external_gene_calls ..........................: 0 external_gene_amino_acid_seqs ................: 0 skip_predict_frame ...........................: 0 splits_consider_gene_calls ...................: 1 scg_taxonomy_was_run .........................: 0 scg_taxonomy_database_version ................: None trna_taxonomy_was_run ........................: 0 trna_taxonomy_database_version ...............: None creation_date ................................: 1706651817.91107 gene_function_sources ........................: Pfam gene_level_taxonomy_source ...................: kaiju

Please remember that it is never a good idea to change these values. But in some cases it may be absolutely necessary to update something here, and a programmer may ask you to run this program and do it. But even then, you should be extremely careful.

AVAILABLE GENE CALLERS

'prodigal' (16,415,905 gene calls)
'Ribosomal_RNA_28S' (11 gene calls)
'Ribosomal_RNA_23S' (3,430 gene calls)
'Ribosomal_RNA_18S' (13 gene calls)
'Ribosomal_RNA_16S' (1,878 gene calls)

AVAILABLE FUNCTIONAL ANNOTATION SOURCES

Pfam (25,159,321 annotations)

AVAILABLE HMM SOURCES

'Archaea_76' (76 models with 171,317 hits)
'Bacteria_71' (71 models with 331,894 hits)
'Protista_83' (83 models with 19,854 hits)
'Ribosomal_RNA_12S' (1 model with 0 hits)
'Ribosomal_RNA_16S' (3 models with 1,878 hits)
'Ribosomal_RNA_18S' (1 model with 13 hits)
'Ribosomal_RNA_23S' (2 models with 3,430 hits)
'Ribosomal_RNA_28S' (1 model with 11 hits)
'Ribosomal_RNA_5S' (5 models with 0 hits)

When I try it on a smaller contigs.db (1/4 of the samples), it completed without any problem. So I'm guessing there's something with server capacity. My question would be, are there any ways to bypass this issue? I assume the program finished the "Run an HMM search against KOfam" step. Is it possible to resume the program from here?

Thank you very much.

merenlab / anvio

[BUG] anvi-run-kegg-kofams terminated prematurely #2256