Open itsmisterbrown opened 1 year ago
How large is the database you created? Would it be possible to share?
How does your tax mapping look like (UVIG_taxid_mapping_cleaned
). It seems to create some very large taxid values (1446979566
). Maybe I didn't correctly consider that they could be so large.
I got a similar error with itsmisterbrown that the LCA step dies due to a segmentation fault. Here is my command line. And I also attached my log and error files. out.txt err.txt
mmseqs easy-taxonomy \
test.fasta nr.smag.mmetsp.gvog.faaDB \
DB_NR.SMAG.DB_tax_result_test \
tmp \
--orf-filter 0 \
--threads 16 \
--lca-ranks superkingdom,phylum,class,order,family,genus \
--split-memory-limit 500G
Please help me to find out what wrong with my command.
Expected Behavior
Taxonomy assignment of viral OTU sequences (nucleotide) using the 2bLCA method against a custom formatted amino acid database from IMG/VR
Current Behavior
The LCA step dies due to a segmentation fault when using a small test dataset that I have previously had success with when using Antônio Camargo's ICTV MMseqs2 protein database (https://github.com/apcamargo/ictv-mmseqs2-protein-database).
For reference, I have also allocated 40 cores and 700gb RAM to this job, which fails after consuming only 178gb of mem.
Steps to Reproduce (for bugs)
Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.
I have formatted the IMG/VR v4 7.1 AA database as recommended (https://github.com/soedinglab/MMseqs2/wiki#create-a-seqtaxdb-by-manual-annotation-of-a-sequence-database) and I've created a custom taxdump using taxonkit. The custom taxdb was created without issue:
the job was submitted with teh following batch script, including params:
MMseqs Output (for bugs)
Please make sure to also post the complete output of MMseqs. You can use gist.github.com for large output.
Full output and error attached below
tmp/10336174962539687461/taxonomy_tmp/11653652317365833767/tmp_taxonomy/6923600097584969791/taxonomy.sh: line 58: 78000 Segmentation fault (core dumped) "$MMSEQS" lca "${TARGET}" "${LCAIN}" "${RESULTS}" ${LCA_PAR}
Context
Providing context helps us come up with a solution and improve our documentation for the future.
Your Environment
Include as many relevant details about the environment you experienced the bug in.
job-mmseqs_easytax_050523_error.txt job-mmseqs_easytax_050523_out.txt