Open nick-youngblut opened 3 years ago
The seg-fault errors that I'm getting with mmseqs taxonomy
don't appear to be due to the 2 extra split files. Even when tracking all *.idx
files so they don't accidentally get deleted, I get the following error:
taxonomy -e 1e-5 --max-seqs 200 --num-iterations 2 --start-sens 1 --sens-steps 3 -s 6 --lca-ranks superkingdom,kingdom,phylum,class,order,family,genus,species --threads 8 /ebio/abt3_scratch/nyoungblut/LLCDS_126702996474/mmseqs_tax_query/06/seqs_db /ebio/abt3_scratch/nyoungblut/LLCDS_126702996474/mmseqs_tax_target/mmseqs_tax.db /ebio/abt3_scratch/nyoungblut/LLCDS_126702996474/mmseqs_tax/06/seqs_tax_db /ebio/abt3_scratch/nyoungblut/LLCDS_126702996474/mmseqs_tax/06/TMP
MMseqs Version: 13.45111
ORF filter 1
ORF filter e-value 100
ORF filter sensitivity 2
LCA mode 3
Taxonomy output mode 0
Majority threshold 0.5
Vote mode 1
LCA ranks superkingdom,kingdom,phylum,class,order,family,genus,species
Column with taxonomic lineage 0
Compressed 0
Threads 8
Verbosity 3
Taxon blacklist 12908:unclassified sequences,28384:other sequences
Substitution matrix nucl:nucleotide.out,aa:blosum62.out
Add backtrace false
Alignment mode 1
Alignment mode 0
Allow wrapped scoring false
E-value threshold 1e-05
Seq. id. threshold 0
Min alignment length 0
Seq. id. mode 0
Alternative alignments 0
Coverage threshold 0
Coverage mode 0
Max sequence length 65535
Compositional bias 1
Max reject 5
Max accept 30
Include identical seq. id. false
Preload mode 0
Pseudo count a 1
Pseudo count b 1.5
Score bias 0
Realign hits false
Realign score bias -0.2
Realign max seqs 2147483647
Gap open cost nucl:5,aa:11
Gap extension cost nucl:2,aa:1
Zdrop 40
Seed substitution matrix nucl:nucleotide.out,aa:VTML80.out
Sensitivity 6
k-mer length 0
k-score 2147483647
Alphabet size nucl:5,aa:21
Max results per query 200
Split database 0
Split mode 2
Split memory limit 0
Diagonal scoring true
Exact k-mer matching 0
Mask residues 1
Mask lower case residues 0
Minimum diagonal score 15
Spaced k-mers 1
Spaced k-mer pattern
Local temporary path
Rescore mode 0
Remove hits by seq. id. and coverage false
Sort results 0
Mask profile 1
Profile E-value threshold 0.001
Global sequence weighting false
Allow deletions false
Filter MSA 1
Maximum seq. id. threshold 0.9
Minimum seq. id. 0
Minimum score per column -20
Minimum coverage 0
Select N most diverse seqs 1000
Min codons in orf 30
Max codons in length 32734
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 1
Forward frames 1,2,3
Reverse frames 1,2,3
Translation table 1
Translate orf 0
Use all table starts false
Offset of numeric ids 0
Create lookup 0
Add orf stop false
Overlap between sequences 0
Sequence split mode 1
Header split mode 0
Chain overlapping alignments 0
Merge query 1
Search type 0
Search iterations 2
Start sensitivity 1
Search steps 3
Exhaustive search mode false
Filter results during exhaustive search 0
Strand selection 1
LCA search mode false
Disk space limit 0
MPI runner
Force restart with latest tmp false
Remove temporary files false
Create directory /ebio/abt3_scratch/nyoungblut/LLCDS_126702996474/mmseqs_tax/06/TMP/14652724320229658153/tmp_hsp1
search /ebio/abt3_scratch/nyoungblut/LLCDS_126702996474/mmseqs_tax_query/06/seqs_db /ebio/abt3_scratch/nyoungblut/LLCDS_126702996474/mmseqs_tax_target/mmseqs_tax.db /ebio/abt3_scratch/nyoungblut/LLCDS_126702996474/mmseqs_tax/06/TMP/14652724320229658153/first /ebio/abt3_scratch/nyoungblut/LLCDS_126702996474/mmseqs_tax/06/TMP/14652724320229658153/tmp_hsp1 --alignment-mode 1 -e 1e-05 --max-rejected 5 --max-accept 30 --threads 8 -s 6 --max-seqs 200 --spaced-kmer-mode 1 --min-length 30 --max-length 32734 --orf-start-mode 1 --num-iterations 2 --start-sens 1 --sens-steps 3 --lca-search 1
prefilter /ebio/abt3_scratch/nyoungblut/LLCDS_126702996474/mmseqs_tax_query/06/seqs_db /ebio/abt3_scratch/nyoungblut/LLCDS_126702996474/mmseqs_tax_target/mmseqs_tax.db.idx /ebio/abt3_scratch/nyoungblut/LLCDS_126702996474/mmseqs_tax/06/TMP/14652724320229658153/tmp_hsp1/11598483508011826746/pref_0 --sub-mat nucl:nucleotide.out,aa:blosum62.out --seed-sub-mat nucl:nucleotide.out,aa:VTML80.out -s 6 -k 0 --k-score 2147483647 --alph-size nucl:5,aa:21 --max-seq-len 65535 --max-seqs 200 --split 0 --split-mode 2 --split-memory-limit 0 -c 0 --cov-mode 0 --comp-bias-corr 1 --diag-score 1 --exact-kmer-matching 0 --mask 1 --mask-lower-case 0 --min-ungapped-score 15 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 0 --pca 1 --pcb 1.5 --threads 8 --compressed 0 -v 3
Index version: 16
Generated by: 13.45111
ScoreMatrix: VTML80.out
Query database size: 1075 type: Aminoacid
Target split mode. Searching through 16 splits
Estimated memory consumption: 8G
Target database size: 41195879 type: Aminoacid
Process prefiltering step 1 of 16
k-mer similarity threshold: 109
Starting prefiltering scores calculation (step 1 of 16)
Query db start 1 to 1075
Target db start 1 to 2572505
[=================================================================] 1.08K 2s 989ms
390.206187 k-mers per position
423278 DB matches per sequence
0 overflows
0 queries produce too many hits (truncated result)
25 sequences passed prefiltering per query sequence
26 median result list length
0 sequences with 0 size result lists
Time for merging to pref_0_tmp_0: 0h 0m 0s 8ms
Time for merging to pref_0_tmp_0_tmp: 0h 0m 0s 10ms
Process prefiltering step 2 of 16
k-mer similarity threshold: 109
Starting prefiltering scores calculation (step 2 of 16)
Query db start 1 to 1075
Target db start 2572506 to 5147039
[=================================================================] 1.08K 3s 152ms
390.206187 k-mers per position
423330 DB matches per sequence
0 overflows
0 queries produce too many hits (truncated result)
25 sequences passed prefiltering per query sequence
26 median result list length
1 sequences with 0 size result lists
Time for merging to pref_0_tmp_1: 0h 0m 0s 8ms
Time for merging to pref_0_tmp_1_tmp: 0h 0m 0s 36ms
Process prefiltering step 3 of 16
k-mer similarity threshold: 109
Starting prefiltering scores calculation (step 3 of 16)
Query db start 1 to 1075
Target db start 5147040 to 7717242
[=================================================================] 1.08K 2s 825ms
390.206187 k-mers per position
423389 DB matches per sequence
0 overflows
0 queries produce too many hits (truncated result)
25 sequences passed prefiltering per query sequence
26 median result list length
0 sequences with 0 size result lists
Time for merging to pref_0_tmp_2: 0h 0m 0s 43ms
Time for merging to pref_0_tmp_2_tmp: 0h 0m 0s 57ms
Process prefiltering step 4 of 16
k-mer similarity threshold: 109
Starting prefiltering scores calculation (step 4 of 16)
Query db start 1 to 1075
Target db start 7717243 to 10294414
[=================================================================] 1.08K 3s 10ms
390.206187 k-mers per position
423306 DB matches per sequence
0 overflows
0 queries produce too many hits (truncated result)
25 sequences passed prefiltering per query sequence
26 median result list length
1 sequences with 0 size result lists
Time for merging to pref_0_tmp_3: 0h 0m 0s 23ms
Time for merging to pref_0_tmp_3_tmp: 0h 0m 0s 55ms
Process prefiltering step 5 of 16
k-mer similarity threshold: 109
Starting prefiltering scores calculation (step 5 of 16)
Query db start 1 to 1075
Target db start 10294415 to 12871105
[=================================================================] 1.08K 2s 902ms
390.206187 k-mers per position
423264 DB matches per sequence
0 overflows
0 queries produce too many hits (truncated result)
25 sequences passed prefiltering per query sequence
26 median result list length
1 sequences with 0 size result lists
Time for merging to pref_0_tmp_4: 0h 0m 0s 8ms
Time for merging to pref_0_tmp_4_tmp: 0h 0m 0s 11ms
Process prefiltering step 6 of 16
k-mer similarity threshold: 109
Starting prefiltering scores calculation (step 6 of 16)
Query db start 1 to 1075
Target db start 12871106 to 15442705
[=================================================================] 1.08K 2s 907ms
390.206187 k-mers per position
423514 DB matches per sequence
0 overflows
0 queries produce too many hits (truncated result)
25 sequences passed prefiltering per query sequence
26 median result list length
1 sequences with 0 size result lists
Time for merging to pref_0_tmp_5: 0h 0m 0s 9ms
Time for merging to pref_0_tmp_5_tmp: 0h 0m 0s 9ms
Process prefiltering step 7 of 16
k-mer similarity threshold: 109
Starting prefiltering scores calculation (step 7 of 16)
Query db start 1 to 1075
Target db start 15442706 to 18017124
[=================================================================] 1.08K 2s 795ms
390.206187 k-mers per position
423292 DB matches per sequence
0 overflows
0 queries produce too many hits (truncated result)
25 sequences passed prefiltering per query sequence
26 median result list length
1 sequences with 0 size result lists
Time for merging to pref_0_tmp_6: 0h 0m 0s 7ms
Time for merging to pref_0_tmp_6_tmp: 0h 0m 0s 9ms
Process prefiltering step 8 of 16
k-mer similarity threshold: 109
Starting prefiltering scores calculation (step 8 of 16)
Query db start 1 to 1075
Target db start 18017125 to 20593148
[=================================================================] 1.08K 2s 843ms
390.206187 k-mers per position
423223 DB matches per sequence
0 overflows
0 queries produce too many hits (truncated result)
25 sequences passed prefiltering per query sequence
26 median result list length
0 sequences with 0 size result lists
Time for merging to pref_0_tmp_7: 0h 0m 0s 9ms
Time for merging to pref_0_tmp_7_tmp: 0h 0m 0s 10ms
Process prefiltering step 9 of 16
k-mer similarity threshold: 109
Starting prefiltering scores calculation (step 9 of 16)
Query db start 1 to 1075
Target db start 20593149 to 23168610
[=================================================================] 1.08K 3s 92ms
390.206187 k-mers per position
423365 DB matches per sequence
0 overflows
0 queries produce too many hits (truncated result)
25 sequences passed prefiltering per query sequence
26 median result list length
0 sequences with 0 size result lists
Time for merging to pref_0_tmp_8: 0h 0m 0s 7ms
Time for merging to pref_0_tmp_8_tmp: 0h 0m 0s 11ms
Process prefiltering step 10 of 16
k-mer similarity threshold: 109
Starting prefiltering scores calculation (step 10 of 16)
Query db start 1 to 1075
Target db start 23168611 to 25746437
[=================================================================] 1.08K 2s 946ms
390.206187 k-mers per position
423353 DB matches per sequence
0 overflows
0 queries produce too many hits (truncated result)
25 sequences passed prefiltering per query sequence
26 median result list length
1 sequences with 0 size result lists
Time for merging to pref_0_tmp_9: 0h 0m 0s 11ms
Time for merging to pref_0_tmp_9_tmp: 0h 0m 0s 15ms
Process prefiltering step 11 of 16
k-mer similarity threshold: 109
Starting prefiltering scores calculation (step 11 of 16)
Query db start 1 to 1075
Target db start 25746438 to 28318851
[=================================================================] 1.08K 2s 418ms
390.206187 k-mers per position
423304 DB matches per sequence
0 overflows
0 queries produce too many hits (truncated result)
25 sequences passed prefiltering per query sequence
26 median result list length
0 sequences with 0 size result lists
Time for merging to pref_0_tmp_10: 0h 0m 0s 8ms
Time for merging to pref_0_tmp_10_tmp: 0h 0m 0s 14ms
Process prefiltering step 12 of 16
k-mer similarity threshold: 109
Starting prefiltering scores calculation (step 12 of 16)
Query db start 1 to 1075
Target db start 28318852 to 30895702
[=================================================================] 1.08K 3s 701ms
390.206187 k-mers per position
423306 DB matches per sequence
0 overflows
0 queries produce too many hits (truncated result)
25 sequences passed prefiltering per query sequence
26 median result list length
0 sequences with 0 size result lists
Time for merging to pref_0_tmp_11: 0h 0m 0s 61ms
Time for merging to pref_0_tmp_11_tmp: 0h 0m 0s 71ms
Process prefiltering step 13 of 16
k-mer similarity threshold: 109
Starting prefiltering scores calculation (step 13 of 16)
Query db start 1 to 1075
Target db start 30895703 to 33469145
[=================================================================] 1.08K 3s 180ms
390.206187 k-mers per position
423354 DB matches per sequence
0 overflows
0 queries produce too many hits (truncated result)
25 sequences passed prefiltering per query sequence
26 median result list length
1 sequences with 0 size result lists
Time for merging to pref_0_tmp_12: 0h 0m 0s 10ms
Time for merging to pref_0_tmp_12_tmp: 0h 0m 0s 14ms
Process prefiltering step 14 of 16
k-mer similarity threshold: 109
Starting prefiltering scores calculation (step 14 of 16)
Query db start 1 to 1075
Target db start 33469146 to 36042326
[=================================================================] 1.08K 3s 458ms
390.206187 k-mers per position
423372 DB matches per sequence
0 overflows
0 queries produce too many hits (truncated result)
25 sequences passed prefiltering per query sequence
26 median result list length
2 sequences with 0 size result lists
Time for merging to pref_0_tmp_13: 0h 0m 0s 34ms
Time for merging to pref_0_tmp_13_tmp: 0h 0m 0s 44ms
Process prefiltering step 15 of 16
k-mer similarity threshold: 109
Starting prefiltering scores calculation (step 15 of 16)
Query db start 1 to 1075
Target db start 36042327 to 38619947
[=================================================================] 1.08K 3s 98ms
390.206187 k-mers per position
423325 DB matches per sequence
0 overflows
0 queries produce too many hits (truncated result)
25 sequences passed prefiltering per query sequence
26 median result list length
1 sequences with 0 size result lists
Time for merging to pref_0_tmp_14: 0h 0m 0s 29ms
Time for merging to pref_0_tmp_14_tmp: 0h 0m 0s 31ms
Process prefiltering step 16 of 16
k-mer similarity threshold: 109
Starting prefiltering scores calculation (step 16 of 16)
Query db start 1 to 1075
Target db start 38619948 to 41195879
[=================================================================] 1.08K 2s 904ms
390.206187 k-mers per position
423266 DB matches per sequence
0 overflows
0 queries produce too many hits (truncated result)
25 sequences passed prefiltering per query sequence
26 median result list length
0 sequences with 0 size result lists
Time for merging to pref_0_tmp_15: 0h 0m 0s 24ms
Time for merging to pref_0_tmp_15_tmp: 0h 0m 0s 20ms
Merging 16 target splits to pref_0
Preparing offsets for merging: 0h 0m 0s 53ms
[=================================================================] 1.08K 0s 37ms
Time for merging to pref_0: 0h 0m 0s 23ms
Time for merging target splits: 0h 0m 0s 174ms
Time for merging to pref_0_tmp: 0h 0m 0s 45ms
Time for processing: 0h 6m 46s 299ms
lcaalign /ebio/abt3_scratch/nyoungblut/LLCDS_126702996474/mmseqs_tax_query/06/seqs_db /ebio/abt3_scratch/nyoungblut/LLCDS_126702996474/mmseqs_tax_target/mmseqs_tax.db.idx /ebio/abt3_scratch/nyoungblut/LLCDS_126702996474/mmseqs_tax/06/TMP/14652724320229658153/tmp_hsp1/11598483508011826746/pref_0 /ebio/abt3_scratch/nyoungblut/LLCDS_126702996474/mmseqs_tax/06/TMP/14652724320229658153/tmp_hsp1/11598483508011826746/aln_0 --sub-mat nucl:nucleotide.out,aa:blosum62.out -a 1 --alignment-mode 1 --alignment-output-mode 0 --wrapped-scoring 0 -e 1e-05 --min-seq-id 0 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 0 --cov-mode 0 --max-seq-len 65535 --comp-bias-corr 1 --max-rejected 5 --max-accept 30 --add-self-matches 0 --db-load-mode 0 --pca 1 --pcb 1.5 --score-bias 0 --realign 1 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --gap-open nucl:5,aa:11 --gap-extend nucl:2,aa:1 --zdrop 40 --threads 8 --compressed 0 -v 3
Index version: 16
Generated by: 13.45111
ScoreMatrix: VTML80.out
Compute score and coverage
Query database size: 1075 type: Aminoacid
Target database size: 41195879 type: Aminoacid
[=================================================================] 1.08K 0s 508ms
Time for merging to aln_0: 0h 0m 0s 8ms
19048 alignments calculated
15817 sequence pairs passed the thresholds (0.830376 of overall calculated)
14.713489 hits per query sequence
Time for processing: 0h 0m 54s 194ms
result2profile /ebio/abt3_scratch/nyoungblut/LLCDS_126702996474/mmseqs_tax_query/06/seqs_db /ebio/abt3_scratch/nyoungblut/LLCDS_126702996474/mmseqs_tax_target/mmseqs_tax.db.idx /ebio/abt3_scratch/nyoungblut/LLCDS_126702996474/mmseqs_tax/06/TMP/14652724320229658153/tmp_hsp1/11598483508011826746/aln_0 /ebio/abt3_scratch/nyoungblut/LLCDS_126702996474/mmseqs_tax/06/TMP/14652724320229658153/tmp_hsp1/11598483508011826746/profile_0 --sub-mat nucl:nucleotide.out,aa:blosum62.out -e 1e-05 --mask-profile 1 --e-profile 0.1 --comp-bias-corr 1 --wg 0 --allow-deletion 0 --filter-msa 1 --max-seq-id 0.9 --qid 0 --qsc -20 --cov 0 --diff 1000 --pca 0 --pcb 1.5 --db-load-mode 0 --gap-open nucl:5,aa:11 --gap-extend nucl:2,aa:1 --threads 8 --compressed 0 -v 3
Index version: 16
Generated by: 13.45111
ScoreMatrix: VTML80.out
Query database size: 1075 type: Aminoacid
Target database size: 41195879 type: Aminoacid
[========================================Segmentation fault
Error: Create profile died
Error: First search died
Note that sometimes when I re-run the command, I instead get the error:
Index version: 16
Generated by: 13.45111
ScoreMatrix: VTML80.out
Query database size: 1075 type: Aminoacid
Target database size: 41195879 type: Aminoacid
[=======================================================]
free(): invalid next size (normal)
Aborted
Error: Create profile died
Error: First search died
System memory should not be the cause; I've got ~800 Gb free.
Maybe I'm missing a "hidden" input file (ie., one of the files associated with the main input files, which are generally no mentioned in any of the docs). The input files that are present:
If I had to guess, there's probably something wrong with the *.idx
files.
I introduced the two additional splits because of https://github.com/soedinglab/MMseqs2/issues/338. Though that wasn't very effective to reduce peak memory use.
The error looks like a memory corruption though. I am not really sure how to reproduce the issue locally. Do you still have the tmp files? Could you try rerunning only the last step without the index:
mmseqs result2profile /ebio/abt3_scratch/nyoungblut/LLCDS_126702996474/mmseqs_tax_query/06/seqs_db /ebio/abt3_scratch/nyoungblut/LLCDS_126702996474/mmseqs_tax_target/mmseqs_tax.db /ebio/abt3_scratch/nyoungblut/LLCDS_126702996474/mmseqs_tax/06/TMP/14652724320229658153/tmp_hsp1/11598483508011826746/aln_0 /ebio/abt3_scratch/nyoungblut/LLCDS_126702996474/mmseqs_tax/06/TMP/14652724320229658153/tmp_hsp1/11598483508011826746/profile_0 --sub-mat nucl:nucleotide.out,aa:blosum62.out -e 1e-05 --mask-profile 1 --e-profile 0.1 --comp-bias-corr 1 --wg 0 --allow-deletion 0 --filter-msa 1 --max-seq-id 0.9 --qid 0 --qsc -20 --cov 0 --diff 1000 --pca 0 --pcb 1.5 --db-load-mode 0 --gap-open nucl:5,aa:11 --gap-extend nucl:2,aa:1 --threads 8 --compressed 0 -v 3
The only change was to remove the .idx
suffix after mmseqs_tax.db
.
The next step would be to try a MMseqs2 build instrumented with ASan. Sadly ASan doesn't support static builds so you would have to compile MMseqs2 yourself:
git clone https://github.com/soedinglab/MMseqs2.git
cd MMseqs2;
mkdir build
cd build
cmake -DHAVE_SANITIZER=1 -DCMAKE_BUILD_TYPE=ASan ..
make -j $(nproc --all)
The new binary in src/mmseqs
would then hopefully be able to tell what is going wrong:
Path-To-Where-You-Git-Clone/MMseqs2/build/src/mmseqs result2profile /ebio/abt3_scratch/nyoungblut/LLCDS_126702996474/mmseqs_tax_query/06/seqs_db /ebio/abt3_scratch/nyoungblut/LLCDS_126702996474/mmseqs_tax_target/mmseqs_tax.db.idx /ebio/abt3_scratch/nyoungblut/LLCDS_126702996474/mmseqs_tax/06/TMP/14652724320229658153/tmp_hsp1/11598483508011826746/aln_0 /ebio/abt3_scratch/nyoungblut/LLCDS_126702996474/mmseqs_tax/06/TMP/14652724320229658153/tmp_hsp1/11598483508011826746/profile_0 --sub-mat nucl:nucleotide.out,aa:blosum62.out -e 1e-05 --mask-profile 1 --e-profile 0.1 --comp-bias-corr 1 --wg 0 --allow-deletion 0 --filter-msa 1 --max-seq-id 0.9 --qid 0 --qsc -20 --cov 0 --diff 1000 --pca 0 --pcb 1.5 --db-load-mode 0 --gap-open nucl:5,aa:11 --gap-extend nucl:2,aa:1 --threads 8 --compressed 0 -v 3
Removing the *.idx
suffix for mmseqs result2profile
did not fix the issue. I'll try the ASan build next.
Here's the output from the ASan run:
./build/src/mmseqs result2profile \
> /ebio/abt3_scratch/nyoungblut/LLCDS_126702996474/mmseqs_tax_query/09/seqs_db \
> /ebio/abt3_scratch/nyoungblut/LLCDS_126702996474/mmseqs_tax_target/mmseqs_tax.db \
> /ebio/abt3_scratch/nyoungblut/LLCDS_126702996474/mmseqs_tax/09/TMP/1355100225373504351/tmp_hsp1/9650299475897910544/aln_0 \
> /ebio/abt3_scratch/nyoungblut/LLCDS_126702996474/mmseqs_tax/09/TMP/1355100225373504351/tmp_hsp1/9650299475897910544/profile_0 \
> --sub-mat nucl:nucleotide.out,aa:blosum62.out -e 1e-05 --mask-profile 1 --e-profile 0.1 --comp-bias-corr 1 --wg 0 --allow-deletion 0 \
> --filter-msa 1 --max-seq-id 0.9 --qid 0 --qsc -20 --cov 0 --diff 1000 --pca 0 --pcb 1.5 --db-load-mode 0 --gap-open nucl:5,aa:11 \
> --gap-extend nucl:2,aa:1 --threads 8 --compressed 0 -v 3
result2profile /ebio/abt3_scratch/nyoungblut/LLCDS_126702996474/mmseqs_tax_query/09/seqs_db /ebio/abt3_scratch/nyoungblut/LLCDS_126702996474/mmseqs_tax_target/mmseqs_tax.db /ebio/abt3_scratch/nyoungblut/LLCDS_126702996474/mmseqs_tax/09/TMP/1355100225373504351/tmp_hsp1/9650299475897910544/aln_0 /ebio/abt3_scratch/nyoungblut/LLCDS_126702996474/mmseqs_tax/09/TMP/1355100225373504351/tmp_hsp1/9650299475897910544/profile_0 --sub-mat nucl:nucleotide.out,aa:blosum62.out -e 1e-05 --mask-profile 1 --e-profile 0.1 --comp-bias-corr 1 --wg 0 --allow-deletion 0 --filter-msa 1 --max-seq-id 0.9 --qid 0 --qsc -20 --cov 0 --diff 1000 --pca 0 --pcb 1.5 --db-load-mode 0 --gap-open nucl:5,aa:11 --gap-extend nucl:2,aa:1 --threads 8 --compressed 0 -v 3
MMseqs Version: a6cab565c98376623e82c3a04d186219d4c2f10c
Substitution matrix nucl:nucleotide.out,aa:blosum62.out
E-value threshold 1e-05
Mask profile 1
Profile E-value threshold 1e-05
Compositional bias 1
Global sequence weighting false
Allow deletions false
Filter MSA 1
Maximum seq. id. threshold 0.9
Minimum seq. id. 0
Minimum score per column -20
Minimum coverage 0
Select N most diverse seqs 1000
Pseudo count a 0
Pseudo count b 1.5
Preload mode 0
Gap open cost nucl:5,aa:11
Gap extension cost nucl:2,aa:1
Threads 8
Compressed 0
Verbosity 3
Query database size: 1151 type: Aminoacid
Target database size: 41195879 type: Aminoacid
================================================================= ] 46.43% 535 eta 0s
==71239==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x61a0000233e0 at pc 0x55c61d242cd7 bp 0x7fc0f27db1b0 sp 0x7fc0f27db1a0
WRITE of size 1 at 0x61a0000233e0 thread T3
==71239==AddressSanitizer: while reporting a bug found another one. Ignoring.08K eta 0s
#0 0x55c61d242cd6 in MultipleAlignment::updateGapsInSequenceSet(char**, unsigned long, std::vector<std::vector<unsigned char, std::allocator<unsigned char> >, std::allocator<std::vector<unsigned char, std::allocator<unsigned char> > > > const&, std::vector<Matcher::result_t, std::allocator<Matcher::result_t> > const&, unsigned int*, bool) /ebio/abt3_projects/software/dev/ll_pipelines/llcds/tmp/mmseqs_taxonomy/MMseqs2/src/alignment/MultipleAlignment.cpp:168
#1 0x55c61d2432cc in MultipleAlignment::computeMSA(Sequence*, std::vector<std::vector<unsigned char, std::allocator<unsigned char> >, std::allocator<std::vector<unsigned char, std::allocator<unsigned char> > > > const&, std::vector<Matcher::result_t, std::allocator<Matcher::result_t> > const&, bool) /ebio/abt3_projects/software/dev/ll_pipelines/llcds/tmp/mmseqs_taxonomy/MMseqs2/src/alignment/MultipleAlignment.cpp:208
#2 0x55c61d180e7b in result2profile(int, char const**, Command const&, bool) [clone ._omp_fn.0] /ebio/abt3_projects/software/dev/ll_pipelines/llcds/tmp/mmseqs_taxonomy/MMseqs2/src/util/result2profile.cpp:203
#3 0x7fc0f70d796d (/usr/lib/x86_64-linux-gnu/libgomp.so.1+0x1696d)
#4 0x7fc0f6c916da in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da)
#5 0x7fc0f69ba71e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x12171e)
0x61a0000233e0 is located 0 bytes to the right of 1376-byte region [0x61a000022e80,0x61a0000233e0)
allocated by thread T3 here:
#0 0x7fc0f812b790 in posix_memalign (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xdf790)
#1 0x55c61cd2e5c3 in mem_align(unsigned long, unsigned long) /ebio/abt3_projects/software/dev/ll_pipelines/llcds/tmp/mmseqs_taxonomy/MMseqs2/lib/simd/simd.h:463
#2 0x55c61cee071f in malloc_simd_int(unsigned long) /ebio/abt3_projects/software/dev/ll_pipelines/llcds/tmp/mmseqs_taxonomy/MMseqs2/lib/simd/simd.h:483
#3 0x55c61d2410c9 in MultipleAlignment::initX(int) /ebio/abt3_projects/software/dev/ll_pipelines/llcds/tmp/mmseqs_taxonomy/MMseqs2/src/alignment/MultipleAlignment.cpp:19
#4 0x55c61d243175 in MultipleAlignment::computeMSA(Sequence*, std::vector<std::vector<unsigned char, std::allocator<unsigned char> >, std::allocator<std::vector<unsigned char, std::allocator<unsigned char> > > > const&, std::vector<Matcher::result_t, std::allocator<Matcher::result_t> > const&, bool) /ebio/abt3_projects/software/dev/ll_pipelines/llcds/tmp/mmseqs_taxonomy/MMseqs2/src/alignment/MultipleAlignment.cpp:198
#5 0x55c61d180e7b in result2profile(int, char const**, Command const&, bool) [clone ._omp_fn.0] /ebio/abt3_projects/software/dev/ll_pipelines/llcds/tmp/mmseqs_taxonomy/MMseqs2/src/util/result2profile.cpp:203
#6 0x7fc0f70d796d (/usr/lib/x86_64-linux-gnu/libgomp.so.1+0x1696d)
Thread T3 created by T0 here:
#0 0x7fc0f8083d2f in __interceptor_pthread_create (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x37d2f)
#1 0x7fc0f70d7f5f (/usr/lib/x86_64-linux-gnu/libgomp.so.1+0x16f5f)
#2 0x7fc0f70ceed9 in GOMP_parallel (/usr/lib/x86_64-linux-gnu/libgomp.so.1+0xded9)
#3 0x7ffc996a2d2f (<unknown module>)
SUMMARY: AddressSanitizer: heap-buffer-overflow /ebio/abt3_projects/software/dev/ll_pipelines/llcds/tmp/mmseqs_taxonomy/MMseqs2/src/alignment/MultipleAlignment.cpp:168 in MultipleAlignment::updateGapsInSequenceSet(char**, unsigned long, std::vector<std::vector<unsigned char, std::allocator<unsigned char> >, std::allocator<std::vector<unsigned char, std::allocator<unsigned char> > > > const&, std::vector<Matcher::result_t, std::allocator<Matcher::result_t> > const&, unsigned int*, bool)
Shadow bytes around the buggy address:
0x0c347fffc620: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c347fffc630: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c347fffc640: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c347fffc650: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c347fffc660: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c347fffc670: 00 00 00 00 00 00 00 00 00 00 00 00[fa]fa fa fa
0x0c347fffc680: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c347fffc690: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c347fffc6a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c347fffc6b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c347fffc6c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==71239==ABORTING
Thanks, I suspected that this might have been the problem. I'll update you once we figure out how to fix this.
Ah sorry, that makes a lot of sense that this doesn't work. Iterative-profile searches won't work currently together with the taxonomy workflow, since the alignment positions computed in the taxonomy workflow don't refer to the same things that the iterative-profile-search workflow expects. I am not this type of search makes sense. Could you explain your use case for combining these two?
I am not sure if it's fixable with the current protocol, we might just disallow taxonomy in combination with iterative-profile searches instead.
Thanks for looking more into the issue.
I carried over the iterative search parameters from some other mmseqs search
jobs. If iterative search parameters don't make sense for mmseqs taxonomy
, then it would be good to remove that from the script docs.
Expected Behavior
I expect
--split 16
formmseqs createindex
to generate 16*.idx
files. Instead, I'm getting 18:Pipeline software (eg., snakemake) generally requires keeping track of all (important) output files produced; otherwise, untracked output files can accidentally be deleted, which is is causing some downstream problems (eg., seg-fault errors for
mmseqs taxonomy
).Steps to Reproduce (for bugs)
Your Environment
OS:
Ubuntu 18.04.5