Open lulux719 opened 7 months ago
This looks like a memory issue, so there is not much we can do. BUT, there is always a way. In this case, I one could split their contigs-db file into 10 different ones using a collection-txt and anvi-split
, then run anvi-run-kegg-kofams
on each one of them separately, and then export the contents of gene_functions
table from each one of them, and then manually import the final hits into the original contigs-db
.
But this is a hacker's workaround, and a machine with a larger memory would have been the most optimal solution of course :)
Short description of the problem
anvi-run-kegg-kofams terminated prematurely after producing the hmm.table file
anvi'o version
Anvi'o .......................................: marie (v8) Python .......................................: 3.10.13 Profile database .............................: 38 Contigs database .............................: 21 Pan database .................................: 16 Genome data storage ..........................: 7 Auxiliary data storage .......................: 2 Structure database ...........................: 2 Metabolic modules database ...................: 4 tRNA-seq database ............................: 2
System info
Installed via conda
Detailed description of the issue
I have tried several times, and anvi-run-kegg-kofams always terminated prematurely after producing the hmm.table file with the below message.
Done with KOfam 🎊
Number of raw hits in table file .............: 397,424,337 Terminated
Here's the hmm.table generated.
head hmm.table 11548766 - K24524 - 0.0034 19.5 0.0 0.0055 18.8 0.0 1.3 1 0 0 1 1 1 1 - 11690881 - K15921 - 4.7e-199 667.4 13.8 5.2e-199 667.3 13.8 1.0 1 0 0 1 1 1 1 - 11605040 - K15921 - 1.8e-190 639.0 21.8 2.4e-190 638.6 21.8 1.1 1 0 0 1 1 1 1 - 11656118 - K15921 - 1.8e-190 639.0 21.8 2.4e-190 638.6 21.8 1.1 1 0 0 1 1 1 1 -
Here's the info of contigs.db. anvi-db-info 03_CONTIGS/contigs.db
DB Info (no touch)
Database Path ................................: 03_CONTIGS/contigs.db description ..................................: [Not found, but it's OK] db_type ......................................: contigs (variant: unknown) version ......................................: 21
DB Info (no touch also)
project_name .................................: ob contigs_db_hash ..............................: hashc9e5c18c split_length .................................: 20000 kmer_size ....................................: 4 num_contigs ..................................: 1193879 total_length .................................: 16702353431 num_splits ...................................: 1450761 genes_are_called .............................: 1 external_gene_calls ..........................: 0 external_gene_amino_acid_seqs ................: 0 skip_predict_frame ...........................: 0 splits_consider_gene_calls ...................: 1 scg_taxonomy_was_run .........................: 0 scg_taxonomy_database_version ................: None trna_taxonomy_was_run ........................: 0 trna_taxonomy_database_version ...............: None creation_date ................................: 1706651817.91107 gene_function_sources ........................: Pfam gene_level_taxonomy_source ...................: kaiju
AVAILABLE GENE CALLERS
AVAILABLE FUNCTIONAL ANNOTATION SOURCES
AVAILABLE HMM SOURCES
When I try it on a smaller contigs.db (1/4 of the samples), it completed without any problem. So I'm guessing there's something with server capacity. My question would be, are there any ways to bypass this issue? I assume the program finished the "Run an HMM search against KOfam" step. Is it possible to resume the program from here?
Thank you very much.