Closed nick-youngblut closed 5 years ago
Could you post the full log? MMseqs2 should be okay with far less memory than you gave it, sounds like you ran into another bug somehow.
Thanks for the quick response! Here's the whole log:
Program call:
taxonomy -e 1e-5 --start-sens 1 -s 6 --sens-steps 3 --lca-ranks phylum:superphylum:subkingdom:kingdom:superkingdom --threads 24 /tmp/global2/nyoungblut/LLMGAG_27929269397/linclust/genes_db /ebio/abt3_projects/databases_no-backup/uniclust/uniclust90_2018_08_db /tmp/global2/nyoungblut/LLMGAG_27929269397/taxonomy/genes_tax_db /tmp/global2/nyoungblut/LLMGAG_27929269397/taxonomy/tmp/
MMseqs Version: 7.4e23d
Sub Matrix blosum62.out
Add backtrace false
Alignment mode 2
E-value threshold 1e-05
Seq. Id Threshold 0
Seq. Id. Mode 0
Alternative alignments 0
Coverage threshold 0
Coverage Mode 0
Max. sequence length 65535
Max. results per query 300
Compositional bias 1
Realign hit false
Max Reject 2147483647
Max Accept 2147483647
Include identical Seq. Id. false
Preload mode 0
Pseudo count a 1
Pseudo count b 1.5
Score bias 0
Gap open cost 11
Gap extension cost 1
Threads 24
Verbosity 3
Sensitivity 6
K-mer size 0
K-score 2147483647
Alphabet size 21
Offset result 0
Split DB 0
Split mode 2
Split Memory Limit 0
Diagonal Scoring 1
Exact k-mer matching 0
Mask Residues 1
Minimum Diagonal score 15
Spaced Kmer 1
Spaced k-mer pattern
Local temporary path
Rescore mode 0
Remove hits by seq.id. and coverage false
Sort results 0
In substitution scoring mode, performs global alignment along the diagonal false
Mask profile 1
Profile e-value threshold 0.001
Use global sequence weighting false
Filter MSA 1
Maximum sequence identity threshold 0.9
Minimum seq. id. 0
Minimum score per column -20
Minimum coverage 0
Select n most diverse seqs 1000
Omit Consensus false
Min codons in orf 30
Max codons in length 32734
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 0
Forward Frames 1,2,3
Reverse Frames 1,2,3
Translation Table 1
Use all table starts false
Offset of numeric ids 0
Add Orf Stop false
Number search iterations 1
Start sensitivity 1
Search steps 3
Run a seq-profile search in slice mode false
Strand selection 1
Disk space limit 0
Sets the MPI runner
Remove Temporary Files false
LCA Ranks phylum:superphylum:subkingdom:kingdom:superkingdom
Blacklisted Taxa 12908,28384
LCA Mode 2
Remove Temporary Files false
Sets the MPI runner
Program call:
search /tmp/global2/nyoungblut/LLMGAG_27929269397/linclust/genes_db /ebio/abt3_projects/databases_no-backup/uniclust/uniclust90_2018_08_db /tmp/global2/nyoungblut/LLMGAG_27929269397/taxonomy/tmp//15538800487586745695/first /tmp/global2/nyoungblut/LLMGAG_27929269397/taxonomy/tmp//15538800487586745695/tmp_hsp1 --sub-mat blosum62.out -a 0 --alignment-mode 2 -e 1e-05 --min-seq-id 0 --seq-id-mode 0 --alt-ali 0 -c 0 --cov-mode 0 --max-seq-len 65535 --max-seqs 300 --comp-bias-corr 1 --realign 0 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 0 --pca 1 --pcb 1.5 --score-bias 0 --gap-open 11 --gap-extend 1 --threads 24 -v 3 -s 6 -k 0 --k-score 2147483647 --alph-size 21 --offset-result 0 --split 0 --split-mode 2 --split-memory-limit 0 --diag-score 1 --exact-kmer-matching 0 --mask 1 --min-ungapped-score 15 --spaced-kmer-mode 1 --rescore-mode 0 --filter-hits 0 --sort-results 0 --global-alignment 0 --mask-profile 1 --e-profile 0.001 --wg 0 --filter-msa 1 --max-seq-id 0.9 --qid 0 --qsc -20 --cov 0 --diff 1000 --omit-consensus 0 --min-length 30 --max-length 32734 --max-gaps 2147483647 --contig-start-mode 2 --contig-end-mode 2 --orf-start-mode 0 --forward-frames 1,2,3 --reverse-frames 1,2,3 --translation-table 1 --use-all-table-starts 0 --id-offset 0 --add-orf-stop 0 --num-iterations 1 --start-sens 1 --sens-steps 3 --slice-search 0 --strand 1 --disk-space-limit 0 --remove-tmp-files 0
MMseqs Version: 7.4e23d
Sub Matrix blosum62.out
Add backtrace false
Alignment mode 2
E-value threshold 1e-05
Seq. Id Threshold 0
Seq. Id. Mode 0
Alternative alignments 0
Coverage threshold 0
Coverage Mode 0
Max. sequence length 65535
Max. results per query 300
Compositional bias 1
Realign hit false
Max Reject 2147483647
Max Accept 2147483647
Include identical Seq. Id. false
Preload mode 0
Pseudo count a 1
Pseudo count b 1.5
Score bias 0
Gap open cost 11
Gap extension cost 1
Threads 24
Verbosity 3
Sensitivity 6
K-mer size 0
K-score 2147483647
Alphabet size 21
Offset result 0
Split DB 0
Split mode 2
Split Memory Limit 0
Diagonal Scoring 1
Exact k-mer matching 0
Mask Residues 1
Minimum Diagonal score 15
Spaced Kmer 1
Spaced k-mer pattern
Local temporary path
Rescore mode 0
Remove hits by seq.id. and coverage false
Sort results 0
In substitution scoring mode, performs global alignment along the diagonal false
Mask profile 1
Profile e-value threshold 0.001
Use global sequence weighting false
Filter MSA 1
Maximum sequence identity threshold 0.9
Minimum seq. id. 0
Minimum score per column -20
Minimum coverage 0
Select n most diverse seqs 1000
Omit Consensus false
Min codons in orf 30
Max codons in length 32734
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 0
Forward Frames 1,2,3
Reverse Frames 1,2,3
Translation Table 1
Use all table starts false
Offset of numeric ids 0
Add Orf Stop false
Number search iterations 1
Start sensitivity 1
Search steps 3
Run a seq-profile search in slice mode false
Strand selection 1
Disk space limit 0
Sets the MPI runner
Remove Temporary Files false
Program call:
align /tmp/global2/nyoungblut/LLMGAG_27929269397/linclust/genes_db /ebio/abt3_projects/databases_no-backup/uniclust/uniclust90_2018_08_db /tmp/global2/nyoungblut/LLMGAG_27929269397/taxonomy/tmp//15538800487586745695/tmp_hsp1/17220669400861690567/pref_1.000 /tmp/global2/nyoungblut/LLMGAG_27929269397/taxonomy/tmp//15538800487586745695/tmp_hsp1/17220669400861690567/aln_1.000 --sub-mat blosum62.out -a 0 --alignment-mode 2 -e 1e-05 --min-seq-id 0 --seq-id-mode 0 --alt-ali 0 -c 0 --cov-mode 0 --max-seq-len 65535 --max-seqs 300 --comp-bias-corr 1 --realign 0 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 0 --pca 1 --pcb 1.5 --score-bias 0 --gap-open 11 --gap-extend 1 --threads 24 -v 3
MMseqs Version: 7.4e23d
Sub Matrix blosum62.out
Add backtrace false
Alignment mode 2
E-value threshold 1e-05
Seq. Id Threshold 0
Seq. Id. Mode 0
Alternative alignments 0
Coverage threshold 0
Coverage Mode 0
Max. sequence length 65535
Max. results per query 300
Compositional bias 1
Realign hit false
Max Reject 2147483647
Max Accept 2147483647
Include identical Seq. Id. false
Preload mode 0
Pseudo count a 1
Pseudo count b 1.5
Score bias 0
Gap open cost 11
Gap extension cost 1
Threads 24
Verbosity 3
Init data structures...
Compute score and coverage.
Touch data file /tmp/global2/nyoungblut/LLMGAG_27929269397/linclust/genes_db ... Done.
Touch data file /ebio/abt3_projects/databases_no-backup/uniclust/uniclust90_2018_08_db ... Done.
Query database type: Aminoacid
Target database type: Aminoacid
Calculation of Smith-Waterman alignments.
................................................................................................... 1 Mio. sequences processed
.......
What is the error message?
Could not allocate foundDiagonals memory in QueryMatcher
should only be possible to happen during the prefiltering stage not the alignment stage.
Could not allocate foundDiagonals memory in QueryMatcher
is the only error message that I received.
I was running this in a snakemake
pipeline, which tried the run with progressively more memory (240, 480, 720 GB), and each time, I got the error: Could not allocate foundDiagonals memory in QueryMatcher
, and the log file looked the same (less dots at the end of the log file when less memory was used)
I am not sure how snakemake implements its memory limit, but you might have to tell the MMseqs2 prefilter how much memory it is allowed to use using the --split-memory-limit
parameter. By default MMseqs2 assumes it is supposed to use the whole machine.
For example with --split-memory-limit 200000000
for about 200GB of max memory. I think the description text is however slightly wrong, the parameter expects the memory in kilobyte not megabyte. I have to double check that.
Sorry for not making the memory limit clear: snakemake is just running qsub jobs for me, and it's just setting different amounts of memory (eg., qsub -l h_vmem=720G
).
I'll try --split-memory-limit
and see if it fixes the problem
It turns out that the issue wasn't a memory error, but instead a bug in my pipeline code that killed the job prematurely. Sorry to waste your time on this.
I'm using
mmseqs2 7.4e23d h21aa3a5_1 bioconda
, and I'm trying to taxonomically classify a set of ~4 million representative AA sequences (generated byplass
, clustered withlinclust
, then using a representative of each cluster), and I'm using uniclust90_2018_08 for the taxonomy db. The command is:I've tried providing up to 720 GB of memory, and I still get a memory error:
Could not allocate foundDiagonals memory in QueryMatcher
. This happens during the stage:Is there a good way of reducing the memory usage for
mmseqs taxonomy
? I didn't see anything in the script doc or the wiki on reducing memory usage for taxonomy inference.