soedinglab / MMseqs2

MMseqs2: ultra fast and sensitive search and clustering suite
https://mmseqs.com
GNU General Public License v3.0
1.36k stars 190 forks source link

std::bad_alloc error with convertalis #471

Open HaimAshk opened 3 years ago

HaimAshk commented 3 years ago

Hi,

I'm getting an error when trying to blastn sequences vs NT DB. I also tried to just run the last convertalis command on a different computer and saw it crashed after getting to ~2TB of RAM usage. Is there a way to bypass and solve this issue?

Thanks! Haim

Expected Behavior

Current Behavior

Steps to Reproduce (for bugs)

Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.

MMseqs Output (for bugs)

Please make sure to also post the complete output of MMseqs. You can use gist.github.com for large output.

easy-search --search-type 3 --format-output query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits,qlen /tmp/rep.fasta.gz /db/nt_MMSeq2_Jan2021/nt_MMSeq2_Jan2021 /tmp/rep_vs_NT_Jan2021.mmseq2.m8 /tmp/ --threads 32 --split-memory-limit 250G

MMseqs Version: 1f302134aa1c6c7c4e2b9da272fd26af33860780 Substitution matrix nucl:nucleotide.out,aa:blosum62.out Add backtrace false Alignment mode 3 Allow wrapped scoring false E-value threshold 0.001 Seq. id. threshold 0 Min alignment length 0 Seq. id. mode 0 Alternative alignments 0 Coverage threshold 0 Coverage mode 0 Max sequence length 65535 Compositional bias 1 Max reject 2147483647 Max accept 2147483647 Include identical seq. id. false Preload mode 0 Pseudo count a 1 Pseudo count b 1.5 Score bias 0 Realign hits false Realign score bias -0.2 Realign max seqs 2147483647 Gap open cost nucl:5,aa:11 Gap extension cost nucl:2,aa:1 Zdrop 40 Threads 32 Compressed 0 Verbosity 3 Seed substitution matrix nucl:nucleotide.out,aa:VTML80.out Sensitivity 5.7 k-mer length 0 k-score 2147483647 Alphabet size nucl:5,aa:21 Max results per query 300 Split database 0 Split mode 2 Split memory limit 250G Diagonal scoring true Exact k-mer matching 0 Mask residues 1 Mask lower case residues 0 Minimum diagonal score 15 Spaced k-mers 1 Spaced k-mer pattern
Local temporary path
Rescore mode 0 Remove hits by seq. id. and coverage false Sort results 0 Mask profile 1 Profile E-value threshold 0.001 Global sequence weighting false Allow deletions false Filter MSA 1 Maximum seq. id. threshold 0.9 Minimum seq. id. 0 Minimum score per column -20 Minimum coverage 0 Select N most diverse seqs 1000 Min codons in orf 30 Max codons in length 32734 Max orf gaps 2147483647 Contig start mode 2 Contig end mode 2 Orf start mode 1 Forward frames 1,2,3 Reverse frames 1,2,3 Translation table 1 Translate orf 0 Use all table starts false Offset of numeric ids 0 Create lookup 0 Add orf stop false Overlap between sequences 0 Sequence split mode 1 Header split mode 0 Chain overlapping alignments 0 Merge query 1 Search type 3 Search iterations 1 Start sensitivity 4 Search steps 1 Slice search mode false Strand selection 1 LCA search mode false Disk space limit 0 MPI runner
Force restart with latest tmp false Remove temporary files true Alignment format 0 Format alignment output query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits,qlen Database output false Overlap threshold 0 Database type 0 Shuffle input database true Createdb mode 0 Write lookup file 0 Greedy best hits false

createdb /tmp/rep.fasta.gz /tmp//2989869989197200687/query --dbtype 0 --shuffle 1 --createdb-mode 0 --write-lookup 0 --id-offset 0 --compressed 0 -v 3

Converting sequences [=================================================== Time for merging to query_h: 0h 0m 1s 20ms Time for merging to query: 0h 0m 1s 67ms Database type: Nucleotide Time for processing: 0h 0m 34s 389ms Create directory /tmp//2989869989197200687/search_tmp search /tmp//2989869989197200687/query /db/nt_MMSeq2_Jan2021/nt_MMSeq2_Jan2021 /tmp//2989869989197200687/result /tmp//2989869989197200687/search_tmp --alignment-mode 3 --threads 32 -s 5.7 --split-memory-limit 250G --search-type 3 --remove-tmp-files 1

splitsequence /db/nt_MMSeq2_Jan2021/nt_MMSeq2_Jan2021 /tmp//2989869989197200687/search_tmp/6775691152365959592/target_seqs_split --max-seq-len 10000 --sequence-overlap 0 --sequence-split-mode 1 --headers-split-mode 0 --create-lookup 0 --threads 32 --compressed 0 -v 3

[=================================================================] 65.82M 9s 584ms Time for merging to target_seqs_split_h: 0h 0m 26s 386ms Time for merging to target_seqs_split: 0h 0m 27s 933ms Time for processing: 0h 1m 36s 281ms extractframes /tmp//2989869989197200687/query /tmp//2989869989197200687/search_tmp/6775691152365959592/query_seqs --forward-frames 1 --reverse-frames 1 --create-lookup 0 --threads 32 --compressed 0 -v 3

[=================================================================] 514.46K 0s 759ms Time for merging to query_seqs_h: 0h 0m 0s 231ms Time for merging to query_seqs: 0h 0m 3s 221ms Time for processing: 0h 0m 5s 403ms splitsequence /tmp//2989869989197200687/search_tmp/6775691152365959592/query_seqs /tmp//2989869989197200687/search_tmp/6775691152365959592/query_seqs_split --max-seq-len 10000 --sequence-overlap 0 --sequence-split-mode 1 --headers-split-mode 0 --create-lookup 0 --threads 32 --compressed 0 -v 3

[=================================================================] 1.03M 0s 146ms Time for merging to query_seqs_split_h: 0h 0m 0s 281ms Time for merging to query_seqs_split: 0h 0m 0s 333ms Time for processing: 0h 0m 1s 246ms prefilter /tmp//2989869989197200687/search_tmp/6775691152365959592/query_seqs_split /tmp//2989869989197200687/search_tmp/6775691152365959592/target_seqs_split /tmp//2989869989197200687/search_tmp/6775691152365959592/search/pref_0 --sub-mat nucl:nucleotide.out,aa:blosum62.out --seed-sub-mat nucl:nucleotide.out,aa:VTML80.out -k 15 --k-score 2147483647 --alph-size nucl:5,aa:21 --max-seq-len 10000 --max-seqs 300 --split 0 --split-mode 2 --split-memory-limit 250G -c 0 --cov-mode 0 --comp-bias-corr 1 --diag-score 1 --exact-kmer-matching 1 --mask 1 --mask-lower-case 0 --min-ungapped-score 15 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 0 --pca 1 --pcb 1.5 --threads 32 --compressed 0 -v 3 -s 5.7

Query database size: 1298472 type: Nucleotide Target split mode. Searching through 12 splits Estimated memory consumption: 216G Target database size: 90056195 type: Nucleotide Process prefiltering step 1 of 12

Index table k-mer threshold: 0 at k-mer size 15 Index table: counting k-mers [=================================================================] 7.80M 6m 27s 363ms Index table: Masked residues: 517008537 Index table: fill [=================================================================] 7.80M 11m 24s 325ms Index statistics Entries: 27958919735 DB size: 168174 MB Avg k-mer size: 26.038773 Top 10 k-mers GGGGCAGCGTGATTT 319478 TAATCGTGCAGCGGG 292128 GTGCGCAGCGTATCG 276641 CTCTCGGGGGCGTGG 257406 ACAGTTAGTATGTGT 233646 TCCAGGGAGCATGGG 230906 AGTGGAATTTCATGG 224146 TCGCGCTCTGTAGTG 209357 ACTCACGGAGGAGGG 193555 GCCAACTCTAGGGAG 184395 Time for index table init: 0h 18m 59s 10ms k-mer similarity threshold: 0 Starting prefiltering scores calculation (step 1 of 12) Query db start 1 to 1298472 Target db start 1 to 7796647 [=================================================================] 1.30M 55m 45s 390ms

0.917439 k-mers per position 255666 DB matches per sequence 279 overflows 0 queries produce too many hits (truncated result) 42 sequences passed prefiltering per query sequence 45 median result list length 1236 sequences with 0 size result lists Time for merging to pref_0_tmp_0: 0h 0m 5s 282ms Time for merging to pref_0_tmp_0_tmp: 0h 0m 0s 718ms Process prefiltering step 2 of 12

Index table k-mer threshold: 0 at k-mer size 15 Index table: counting k-mers [=================================================================] 7.32M 5m 48s 439ms Index table: Masked residues: 548391268 Index table: fill [=================================================================] 7.32M 11m 41s 676ms Index statistics Entries: 27946347047 DB size: 168102 MB Avg k-mer size: 26.027064 Top 10 k-mers GGGGCAGCGTGATTT 302432 TAATCGTGCAGCGGG 270001 GTGCGCAGCGTATCG 254980 CTCTCGGGGGCGTGG 245801 CCACGCCGGGTCGAG 232302 TCCAGGGAGCATGGG 220720 CACGCCAGCTAGGAG 213322 AGTGGAATTTCATGG 211645 ACTCACGGAGGAGGG 184383 ATTAGGGGCCAAACG 175886 Time for index table init: 0h 18m 20s 132ms k-mer similarity threshold: 0 Starting prefiltering scores calculation (step 2 of 12) Query db start 1 to 1298472 Target db start 7796648 to 15112675 [=================================================================] 1.30M 44m 23s 521ms

0.917439 k-mers per position 274912 DB matches per sequence 265 overflows 0 queries produce too many hits (truncated result) 42 sequences passed prefiltering per query sequence 45 median result list length 1001 sequences with 0 size result lists Time for merging to pref_0_tmp_1: 0h 0m 0s 435ms Time for merging to pref_0_tmp_1_tmp: 0h 0m 0s 767ms Process prefiltering step 3 of 12

Index table k-mer threshold: 0 at k-mer size 15 Index table: counting k-mers [=================================================================] 7.33M 5m 30s 202ms Index table: Masked residues: 528733028 Index table: fill [=================================================================] 7.33M 10m 2s 214ms Index statistics Entries: 27900220272 DB size: 167838 MB Avg k-mer size: 25.984105 Top 10 k-mers GGGGCAGCGTGATTT 243297 TAATCGTGCAGCGGG 231491 AACGATTAATCGGAG 206367 CTCTCGGGGGCGTGG 194936 TACGAGGCGCGGGAT 183478 ACAGTTAGTATGTGT 181256 AGGGTGCAGGTGTAG 174472 TCCAGGGAGCATGGG 173238 AGCACAGGTTTCCTG 162937 TCGCGCTCTGTAGTG 159530 Time for index table init: 0h 16m 19s 363ms k-mer similarity threshold: 0 Starting prefiltering scores calculation (step 3 of 12) Query db start 1 to 1298472 Target db start 15112676 to 22438113 [=================================================================] 1.30M 44m 17s 882ms

0.917439 k-mers per position 271574 DB matches per sequence 178 overflows 0 queries produce too many hits (truncated result) 42 sequences passed prefiltering per query sequence 45 median result list length 979 sequences with 0 size result lists Time for merging to pref_0_tmp_2: 0h 0m 0s 384ms Time for merging to pref_0_tmp_2_tmp: 0h 0m 0s 726ms Process prefiltering step 4 of 12

Index table k-mer threshold: 0 at k-mer size 15 Index table: counting k-mers [=================================================================] 7.66M 5m 25s 221ms Index table: Masked residues: 515919403 Index table: fill [=================================================================] 7.66M 10m 17s 860ms Index statistics Entries: 27956997653 DB size: 168163 MB Avg k-mer size: 26.036983 Top 10 k-mers GGGGCAGCGTGATTT 317032 TAATCGTGCAGCGGG 288721 GTGCGCAGCGTATCG 272966 CTCTCGGGGGCGTGG 255502 ACAGTTAGTATGTGT 230984 TCCAGGGAGCATGGG 229608 AGTGGAATTTCATGG 222369 TCGCGCTCTGTAGTG 208085 ACTCACGGAGGAGGG 192057 CAGTGTGTGTAGTGG 182199 Time for index table init: 0h 16m 27s 836ms k-mer similarity threshold: 0 Starting prefiltering scores calculation (step 4 of 12) Query db start 1 to 1298472 Target db start 22438114 to 30101861 [=================================================================] 1.30M 1h 3m 25s 964ms

0.917439 k-mers per position 266756 DB matches per sequence 280 overflows 0 queries produce too many hits (truncated result) 42 sequences passed prefiltering per query sequence 45 median result list length 1036 sequences with 0 size result lists Time for merging to pref_0_tmp_3: 0h 0m 0s 378ms Time for merging to pref_0_tmp_3_tmp: 0h 0m 0s 740ms Process prefiltering step 5 of 12

Index table k-mer threshold: 0 at k-mer size 15 Index table: counting k-mers [=================================================================] 7.53M 5m 30s 242ms Index table: Masked residues: 553861159 Index table: fill [=================================================================] 7.53M 11m 6s 119ms Index statistics Entries: 27929979419 DB size: 168008 MB Avg k-mer size: 26.011820 Top 10 k-mers GGGGCAGCGTGATTT 305689 TAATCGTGCAGCGGG 273513 GTGCGCAGCGTATCG 258679 CTCTCGGGGGCGTGG 248552 CCACGCCGGGTCGAG 235716 TCCAGGGAGCATGGG 222601 CACGCCAGCTAGGAG 215703 AGTGGAATTTCATGG 213837 ACTCACGGAGGAGGG 186164 ATTAGGGGCCAAACG 178017 Time for index table init: 0h 17m 20s 368ms k-mer similarity threshold: 0 Starting prefiltering scores calculation (step 5 of 12) Query db start 1 to 1298472 Target db start 30101862 to 37628027 [=================================================================] 1.30M 56m 9s 404ms

0.917439 k-mers per position 271980 DB matches per sequence 274 overflows 0 queries produce too many hits (truncated result) 42 sequences passed prefiltering per query sequence 45 median result list length 1008 sequences with 0 size result lists Time for merging to pref_0_tmp_4: 0h 0m 0s 363ms Time for merging to pref_0_tmp_4_tmp: 0h 0m 0s 760ms Process prefiltering step 6 of 12

Index table k-mer threshold: 0 at k-mer size 15 Index table: counting k-mers [=================================================================] 7.42M 5m 24s 811ms Index table: Masked residues: 525484196 Index table: fill [=================================================================] 7.42M 11m 32s 518ms Index statistics Entries: 27972994863 DB size: 168254 MB Avg k-mer size: 26.051882 Top 10 k-mers GGGGCAGCGTGATTT 244801 TAATCGTGCAGCGGG 232626 AACGATTAATCGGAG 207999 CTCTCGGGGGCGTGG 196420 TACGAGGCGCGGGAT 184136 ACAGTTAGTATGTGT 183047 AGGGTGCAGGTGTAG 176276 TCCAGGGAGCATGGG 173982 AGCACAGGTTTCCTG 164447 TCGCGCTCTGTAGTG 160943 Time for index table init: 0h 17m 43s 928ms k-mer similarity threshold: 0 Starting prefiltering scores calculation (step 6 of 12) Query db start 1 to 1298472 Target db start 37628028 to 45047375 [=================================================================] 1.30M 18m 11s 653ms

0.917439 k-mers per position 256760 DB matches per sequence 136 overflows 0 queries produce too many hits (truncated result) 42 sequences passed prefiltering per query sequence 45 median result list length 1009 sequences with 0 size result lists Time for merging to pref_0_tmp_5: 0h 0m 0s 416ms Time for merging to pref_0_tmp_5_tmp: 0h 0m 0s 720ms Process prefiltering step 7 of 12

Index table k-mer threshold: 0 at k-mer size 15 Index table: counting k-mers [=================================================================] 7.80M 5m 2s 613ms Index table: Masked residues: 514774889 Index table: fill [=================================================================] 7.80M 10m 41s 164ms Index statistics Entries: 28098092220 DB size: 168970 MB Avg k-mer size: 26.168388 Top 10 k-mers GGGGCAGCGTGATTT 319151 TAATCGTGCAGCGGG 292218 GTGCGCAGCGTATCG 276625 CTCTCGGGGGCGTGG 257089 ACAGTTAGTATGTGT 233222 TCCAGGGAGCATGGG 230492 AGTGGAATTTCATGG 223709 TCGCGCTCTGTAGTG 208846 ACTCACGGAGGAGGG 192862 CAGTGTGTGTAGTGG 183547 Time for index table init: 0h 16m 29s 433ms k-mer similarity threshold: 0 Starting prefiltering scores calculation (step 7 of 12) Query db start 1 to 1298472 Target db start 45047376 to 52851039 [=================================================================] 1.30M 42m 2s 780ms

0.917439 k-mers per position 268468 DB matches per sequence 280 overflows 0 queries produce too many hits (truncated result) 42 sequences passed prefiltering per query sequence 45 median result list length 979 sequences with 0 size result lists Time for merging to pref_0_tmp_6: 0h 0m 0s 435ms Time for merging to pref_0_tmp_6_tmp: 0h 0m 0s 740ms Process prefiltering step 8 of 12

Index table k-mer threshold: 0 at k-mer size 15 Index table: counting k-mers [=================================================================] 7.31M 4m 58s 818ms Index table: Masked residues: 539302246 Index table: fill [=================================================================] 7.31M 10m 23s 294ms Index statistics Entries: 27909901374 DB size: 167893 MB Avg k-mer size: 25.993121 Top 10 k-mers GGGGCAGCGTGATTT 303802 TAATCGTGCAGCGGG 270271 GTGCGCAGCGTATCG 255221 CTCTCGGGGGCGTGG 245307 CCACGCCGGGTCGAG 232798 TCCAGGGAGCATGGG 220250 AGTGGAATTTCATGG 213181 TCGCGCTCTGTAGTG 197549 ACTCACGGAGGAGGG 184290 ATTAGGGGCCAAACG 175053 Time for index table init: 0h 16m 4s 844ms k-mer similarity threshold: 0 Starting prefiltering scores calculation (step 8 of 12) Query db start 1 to 1298472 Target db start 52851040 to 60159460 [=================================================================] 1.30M 35m 55s 821ms

0.917439 k-mers per position 291196 DB matches per sequence 284 overflows 0 queries produce too many hits (truncated result) 42 sequences passed prefiltering per query sequence 45 median result list length 958 sequences with 0 size result lists Time for merging to pref_0_tmp_7: 0h 0m 0s 394ms Time for merging to pref_0_tmp_7_tmp: 0h 0m 0s 755ms Process prefiltering step 9 of 12

Index table k-mer threshold: 0 at k-mer size 15 Index table: counting k-mers [=================================================================] 7.32M 5m 3s 892ms Index table: Masked residues: 495939070 Index table: fill [=================================================================] 7.32M 10m 28s 309ms Index statistics Entries: 28080210851 DB size: 168868 MB Avg k-mer size: 26.151734 Top 10 k-mers GGGGCAGCGTGATTT 240991 TAATCGTGCAGCGGG 229318 AACGATTAATCGGAG 204618 CTCTCGGGGGCGTGG 195002 TACGAGGCGCGGGAT 181416 ACAGTTAGTATGTGT 179824 AGGGTGCAGGTGTAG 174892 TCCAGGGAGCATGGG 172833 AGCACAGGTTTCCTG 161336 TCGCGCTCTGTAGTG 159888 Time for index table init: 0h 16m 15s 377ms k-mer similarity threshold: 0 Starting prefiltering scores calculation (step 9 of 12) Query db start 1 to 1298472 Target db start 60159461 to 67479075 [=================================================================] 1.30M 22m 17s 156ms

0.917439 k-mers per position 269966 DB matches per sequence 121 overflows 0 queries produce too many hits (truncated result) 42 sequences passed prefiltering per query sequence 45 median result list length 1033 sequences with 0 size result lists Time for merging to pref_0_tmp_8: 0h 0m 0s 419ms Time for merging to pref_0_tmp_8_tmp: 0h 0m 0s 897ms Process prefiltering step 10 of 12

Index table k-mer threshold: 0 at k-mer size 15 Index table: counting k-mers [=================================================================] 7.76M 5m 3s 280ms Index table: Masked residues: 481928997 Index table: fill [=================================================================] 7.76M 10m 27s 288ms Index statistics Entries: 28104694617 DB size: 169008 MB Avg k-mer size: 26.174537 Top 10 k-mers GGGGCAGCGTGATTT 319084 TAATCGTGCAGCGGG 291742 GTGCGCAGCGTATCG 275950 CTCTCGGGGGCGTGG 257217 ACAGTTAGTATGTGT 232875 TCCAGGGAGCATGGG 230888 AGTGGAATTTCATGG 223619 TCGCGCTCTGTAGTG 208663 ACTCACGGAGGAGGG 193225 CAGTGTGTGTAGTGG 183044 Time for index table init: 0h 16m 13s 601ms k-mer similarity threshold: 0 Starting prefiltering scores calculation (step 10 of 12) Query db start 1 to 1298472 Target db start 67479076 to 75236806 [=================================================================] 1.30M 59m 1s 792ms

0.917439 k-mers per position 274723 DB matches per sequence 277 overflows 0 queries produce too many hits (truncated result) 42 sequences passed prefiltering per query sequence 45 median result list length 1068 sequences with 0 size result lists Time for merging to pref_0_tmp_9: 0h 0m 4s 398ms Time for merging to pref_0_tmp_9_tmp: 0h 0m 0s 890ms Process prefiltering step 11 of 12

Index table k-mer threshold: 0 at k-mer size 15 Index table: counting k-mers [=================================================================] 7.33M 4m 48s 354ms Index table: Masked residues: 516172937 Index table: fill [=================================================================] 7.33M 9m 28s 253ms Index statistics Entries: 28124885703 DB size: 169123 MB Avg k-mer size: 26.193341 Top 10 k-mers GGGGCAGCGTGATTT 293229 TAATCGTGCAGCGGG 261503 GTGCGCAGCGTATCG 247301 CTCTCGGGGGCGTGG 237939 CCACGCCGGGTCGAG 225211 TCCAGGGAGCATGGG 213144 CACGCCAGCTAGGAG 206914 AGTGGAATTTCATGG 203910 ACTCACGGAGGAGGG 177651 CAGTGTGTGTAGTGG 170264 Time for index table init: 0h 14m 59s 9ms k-mer similarity threshold: 0 Starting prefiltering scores calculation (step 11 of 12) Query db start 1 to 1298472 Target db start 75236807 to 82571160 [=================================================================] 1.30M 23m 9s 242ms

0.917439 k-mers per position 265400 DB matches per sequence 261 overflows 0 queries produce too many hits (truncated result) 42 sequences passed prefiltering per query sequence 45 median result list length 1163 sequences with 0 size result lists Time for merging to pref_0_tmp_10: 0h 0m 17s 90ms Time for merging to pref_0_tmp_10_tmp: 0h 0m 0s 818ms Process prefiltering step 12 of 12

Index table k-mer threshold: 0 at k-mer size 15 Index table: counting k-mers [=================================================================] 7.49M 5m 8s 662ms Index table: Masked residues: 477689390 Index table: fill [=================================================================] 7.49M 10m 34s 966ms Index statistics Entries: 28128243126 DB size: 169143 MB Avg k-mer size: 26.196468 Top 10 k-mers GGGGCAGCGTGATTT 255888 TAATCGTGCAGCGGG 241953 AACGATTAATCGGAG 215872 CTCTCGGGGGCGTGG 205871 TACGAGGCGCGGGAT 193257 ACAGTTAGTATGTGT 189730 TCCAGGGAGCATGGG 182205 AGTGGAATTTCATGG 181418 AGCACAGGTTTCCTG 171345 TCGCGCTCTGTAGTG 166362 Time for index table init: 0h 16m 25s 785ms k-mer similarity threshold: 0 Starting prefiltering scores calculation (step 12 of 12) Query db start 1 to 1298472 Target db start 82571161 to 90056195 [=================================================================] 1.30M 40m 7s 693ms

0.917439 k-mers per position 248633 DB matches per sequence 224 overflows 0 queries produce too many hits (truncated result) 41 sequences passed prefiltering per query sequence 45 median result list length 1350 sequences with 0 size result lists Time for merging to pref_0_tmp_11: 0h 0m 0s 427ms Time for merging to pref_0_tmp_11_tmp: 0h 0m 0s 867ms Merging 12 target splits to pref_0 Preparing offsets for merging: 0h 0m 1s 151ms [=================================================================] 1.30M 14s 938ms Time for merging to pref_0: 0h 0m 0s 523ms Time for merging target splits: 0h 0m 18s 286ms Time for merging to pref_0_tmp: 0h 0m 6s 63ms Time for processing: 11h 52m 0s 943ms align /tmp//2989869989197200687/search_tmp/6775691152365959592/query_seqs_split /tmp//2989869989197200687/search_tmp/6775691152365959592/target_seqs_split /tmp//2989869989197200687/search_tmp/6775691152365959592/search/pref_0 /tmp//2989869989197200687/search_tmp/6775691152365959592/aln --sub-mat nucl:nucleotide.out,aa:blosum62.out -a 0 --alignment-mode 3 --wrapped-scoring 0 -e 0.001 --min-seq-id 0 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 0 --cov-mode 0 --max-seq-len 10000 --comp-bias-corr 1 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 0 --pca 1 --pcb 1.5 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --gap-open nucl:5,aa:11 --gap-extend nucl:2,aa:1 --zdrop 40 --threads 32 --compressed 0 -v 3

Compute score, coverage and sequence identity Query database size: 1298472 type: Nucleotide Target database size: 90056195 type: Nucleotide Calculation of alignments [=================================================================] 1.30M 1h 24m 48s 423ms Time for merging to aln: 0h 0m 0s 504ms 662682155 alignments calculated 492943101 sequence pairs passed the thresholds (0.743861 of overall calculated) 379.633209 hits per query sequence Time for processing: 1h 27m 9s 264ms rmdb /tmp//2989869989197200687/search_tmp/6775691152365959592/search/pref_0 -v 3

Time for processing: 0h 0m 0s 746ms rmdb /tmp//2989869989197200687/search_tmp/6775691152365959592/search/aln_0 -v 3

Time for processing: 0h 0m 0s 0ms rmdb /tmp//2989869989197200687/search_tmp/6775691152365959592/search/input_0 -v 3

Time for processing: 0h 0m 0s 0ms rmdb /tmp//2989869989197200687/search_tmp/6775691152365959592/search/aln_merge -v 3

Time for processing: 0h 0m 0s 0ms offsetalignment /tmp//2989869989197200687/query /tmp//2989869989197200687/search_tmp/6775691152365959592/query_seqs_split /db/nt_MMSeq2_Jan2021/nt_MMSeq2_Jan2021 /tmp//2989869989197200687/search_tmp/6775691152365959592/target_seqs_split /tmp//2989869989197200687/search_tmp/6775691152365959592/aln /tmp//2989869989197200687/result --chain-alignments 0 --merge-query 1 --search-type 3 --threads 32 --compressed 0 --db-load-mode 0 -v 3

Computing ORF lookup Computing contig offsets Computing contig lookup Time for contig lookup: 0h 0m 0s 29ms Writing results to: /tmp//2989869989197200687/result [=================================================================] 514.46K 49s 642ms

Time for merging to result: 0h 0m 0s 988ms Time for processing: 0h 0m 58s 569ms rmdb /tmp//2989869989197200687/search_tmp/6775691152365959592/q_orfs -v 3

Time for processing: 0h 0m 0s 0ms rmdb /tmp//2989869989197200687/search_tmp/6775691152365959592/q_orfs_aa -v 3

Time for processing: 0h 0m 0s 0ms rmdb /tmp//2989869989197200687/search_tmp/6775691152365959592/t_orfs -v 3

Time for processing: 0h 0m 0s 0ms rmdb /tmp//2989869989197200687/search_tmp/6775691152365959592/t_orfs_aa -v 3

Time for processing: 0h 0m 0s 0ms convertalis /tmp//2989869989197200687/query /db/nt_MMSeq2_Jan2021/nt_MMSeq2_Jan2021 /tmp//2989869989197200687/result /tmp/rep_vs_NT_Jan2021.mmseq2.m8 --sub-mat nucl:nucleotide.out,aa:blosum62.out --format-mode 0 --format-output query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits,qlen --translation-table 1 --gap-open nucl:5,aa:11 --gap-extend nucl:2,aa:1 --db-output 0 --db-load-mode 0 --search-type 3 --threads 32 --compressed 0 -v 3

[====terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc Aborted Error: Convert Alignments died

Context

Providing context helps us come up with a solution and improve our documentation for the future.

Your Environment

Include as many relevant details about the environment you experienced the bug in.

milot-mirdita commented 3 years ago

Could you run only the last module call in a debugger?

gdb --args mmseqs convertalis /tmp//2989869989197200687/query /db/nt_MMSeq2_Jan2021/nt_MMSeq2_Jan2021 /tmp//2989869989197200687/result /tmp/rep_vs_NT_Jan2021.mmseq2.m8 --sub-mat nucl:nucleotide.out,aa:blosum62.out --format-mode 0 --format-output query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits,qlen --translation-table 1 --gap-open nucl:5,aa:11 --gap-extend nucl:2,aa:1 --db-output 0 --db-load-mode 0 --search-type 3 --threads 32 --compressed 0 -v 3

Then wait for a prompt and type r for run and then once it crashes type bt or backtrace and copy the output here.

Using a newer version might also help, though I don't see any relevant changes in convertalis since the commit you used.

HaimAshk commented 3 years ago

Thanks for the quick response!

This is what I get with the gdb:

 gdb --args mmseqs convertalis /tmp//2989869989197200687/query /db/nt_MMSeq2_Jan2021/nt_MMSeq2_Jan2021 /tmp//2989869989197200687/result /tmp/rep_vs_NT_Jan2021.mmseq2.m8 --sub-mat nucl:nucleotide.out,aa:blosum62.out --format-mode 0 --format-output query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits,qlen --translation-table 1 --gap-open nucl:5,aa:11 --gap-extend nucl:2,aa:1 --db-output 0 --db-load-mode 0 --search-type 3 --threads 16 --compressed 0 -v 3
GNU gdb (Ubuntu 8.1.1-0ubuntu1) 8.1.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from mmseqs...done.
(gdb) r
Starting program: mmseqs convertalis /tmp//2989869989197200687/query /db/nt_MMSeq2_Jan2021/nt_MMSeq2_Jan2021 /tmp//2989869989197200687/result /tmp/rep_vs_NT_Jan2021.mmseq2.m8 --sub-mat nucl:nucleotide.out,aa:blosum62.out --format-mode 0 --format-output query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits,qlen --translation-table 1 --gap-open nucl:5,aa:11 --gap-extend nucl:2,aa:1 --db-output 0 --db-load-mode 0 --search-type 3 --threads 16 --compressed 0 -v 3
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
convertalis /tmp//2989869989197200687/query /db/nt_MMSeq2_Jan2021/nt_MMSeq2_Jan2021 /tmp//2989869989197200687/result /tmp/rep_vs_NT_Jan2021.mmseq2.m8 --sub-mat nucl:nucleotide.out,aa:blosum62.out --format-mode 0 --format-output query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits,qlen --translation-table 1 --gap-open nucl:5,aa:11 --gap-extend nucl:2,aa:1 --db-output 0 --db-load-mode 0 --search-type 3 --threads 16 --compressed 0 -v 3 

MMseqs Version:         1f302134aa1c6c7c4e2b9da272fd26af33860780
Substitution matrix     nucl:nucleotide.out,aa:blosum62.out
Alignment format        0
Format alignment output query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits,qlen
Translation table       1
Gap open cost           nucl:5,aa:11
Gap extension cost      nucl:2,aa:1
Database output         false
Preload mode            0
Search type             3
Threads                 16
Compressed              0
Verbosity               3

[New Thread 0x155554aca700 (LWP 28907)]
[New Thread 0x1555548c9700 (LWP 28908)]
[New Thread 0x1555546c8700 (LWP 28909)]
[New Thread 0x1554ea593700 (LWP 28921)]
[New Thread 0x1554ea392700 (LWP 28922)]
[New Thread 0x1554ea191700 (LWP 28923)]
[New Thread 0x1554e9f90700 (LWP 28924)]
[New Thread 0x15549ceb2700 (LWP 28925)]
[New Thread 0x15549ccb1700 (LWP 28926)]
[New Thread 0x15549cab0700 (LWP 28927)]
[New Thread 0x15549c8af700 (LWP 28928)]
[New Thread 0x15549c6ae700 (LWP 28929)]
[New Thread 0x1554996f9700 (LWP 28930)]
[New Thread 0x1554994f8700 (LWP 28931)]
[New Thread 0x1554992f7700 (LWP 28932)]
terminate called after throwing an instance of 'std::bad_alloc'   ] 3.00% 15.44K eta 2h 43m 11s       
  what():  std::bad_alloc

Thread 14 "mmseqs" received signal SIGABRT, Aborted.
[Switching to Thread 0x1554996f9700 (LWP 28930)]
0x000000000086fbd7 in raise ()
(gdb) bt
#0  0x000000000086fbd7 in raise ()
#1  0x000000000086fdf1 in abort ()
#2  0x00000000007d3e15 in __gnu_cxx::__verbose_terminate_handler() ()
#3  0x000000000073c0b6 in __cxxabiv1::__terminate(void (*)()) ()
#4  0x000000000073c101 in std::terminate() ()
#5  0x000000000073a2f4 in __cxa_throw ()
#6  0x000000000073c28c in operator new(unsigned long) ()
#7  0x000000000078720a in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_mutate(unsigned long, unsigned long, char const*, unsigned long) ()
#8  0x00000000007887eb in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_append(char const*, unsigned long) ()
#9  0x0000000000550fc6 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::append (__str=..., this=0x1554996f8520) at /usr/include/c++/7/bits/basic_string.h:1212
#10 convertalignments (argc=<optimized out>, argv=<optimized out>, command=...) at /home/vsts/work/1/s/src/util/convertalignments.cpp:495
#11 0x0000000000848be6 in gomp_thread_start ()
#12 0x000000000085a66b in start_thread (arg=0x1554996f9700) at pthread_create.c:463
#13 0x00000000008f609f in clone ()
milot-mirdita commented 3 years ago

Could you try to run it with --format-output fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits,qlen (thus excluding query,target).

It's crashing here:

                                    case Parameters::OUTFMT_QUERY:
                                        result.append(queryId);

Which should really not happen, the only way this can happen is if something is really wrong with the header database (i guess?).

A different experiment: If you could also repeat the GDB steps, could you print the queryId where it crashes? To do this run p queryId after the bt steps.

HaimAshk commented 3 years ago

Thanks Milot! Indeed it seems that running it with --format-output fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits,qlen was ended successfully. Trying to run the p queryId after the bt when it crashed resulted in the following error: (gdb) p queryId No symbol "queryId" in current context.

I see now that the names of a few of my query IDs are complex/long... So I guess this could be the issue... What would be the best strategy to continue? change the names to something simple like s1, s2, s3, etc., and rerun? Or any other idea?

Thanks! Haim

milot-mirdita commented 3 years ago

Ah sorry, the instructions missed a step and wouldnt work with a release build anyway. Could you paste a few of the query headers here (or send an email)? I can try to figure out what's going wrong with them.

milot-mirdita commented 3 years ago

As an immediate workaround you can try to print qheader instead of query:

--format-output  qheader,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits,qlen

qheader might not be affected by the same bug (just maybe).

HaimAshk commented 3 years ago

Thanks! I saw some headers like these... I guess they cause the problem...

>140.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0-10002_Chr5|12658:13228|77223,77224+,77227+,77226+,77228+,77230+,77231+,77232+,77234+,77236+,77237+,77238+,77240+,77242+,77243+,77244+,77246+,77238+,77249+,77250+,77282+,77253
>1028.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0-10524_Chr9|903402:903530|90623,90625+,90626+,90628+,90629+,90631+,90632+,90634+,90635+,90637+,90639+,90640+,93641+,90643+,90644+,90645+,90647+,90648+,90650+,90651+,90652+,90654+,90656+,90657+,90659+,90661+,90662+,90663+,90668+,90666+,90668+,90669
>16406.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0-9075_Chr22|2704:5230|232,233+,238+,239+,240+,242+,243+,245+,247+,248+,249+,250+,252+,253+,255+,256+,258+,259+,260+,262+,264+,265+,267+,269+,270+,272+,273+,275+,276+,278+,279+,280+,282+,284+,285+,286+,287+,289+,290+,292+,293+,274+,296+,297+,298+,299+,301+,302+,303+,304+,305+,306+,308+,310+,311+,312+,314+,315+,316+,318+,320+,321+,333+,324+,326+,327+,328+,330+,331+,333+,334+,336+,338+,339+,341+,343+,344+,356+,357+,360+,363+,367+,368+,369+,370+,392+,373+,375+,376+,378+,379+,381+,383+,384+,386+,387+,389+,390+,391+,393
>633455.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0-A432_Chr1|213867:214899|19208,19209+,19210+,19212+,19213+,19216+,19218+,19219+,19220+,19222+,19224+,19225+,19227+,19228+,19230+,19232+,19233+,19234+,19236+,19237+,19239+,19240+,19242+,19243+,19245+,19246+,19248+,19250+,19251+,19453+,19254+,19255+,19256+,19557+,19259+,19260+,19262+,19264+,19265+,19266+,19267+,19269+,19270+,19271+,19273+,19274+,19276+,19286+,13287+,19290+,19292+,19293+,19295+,19296+,19298+,19299+,19300+,19302+,19304+,19305+,19307+,19308+,19310+,19312+,19313+,19314+,19316+,19318+,19319+,19320+,19322+,19323+,19324+,19326+,19327+,19328+,19330+,19332+,19333+,19334+,19335+,19337+,19338+,19340+,19341+,19343+,19345+,19346+,19348+,19349+,19350+,19552+,19353+,19359+,19361+,19362+,19365+,19366+,19367+,19369+,19370+,19372+,19373+,19375+,19376+,19377+,19378+,19380+,19381+,19382+,19384+,19385+,19387+,19389+,19390+,19392+,19394+,19395+,19396+,19898+,19400+,19401+,19402+,19404+,19405+,19407+,19409+,19410+,19415+,19418+,19419+,19420+,19421
>7776656.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0-100433242_Chr1|370761:37084414|25018,2501329+,25020+,2504421+,25022
>10765590.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0-17431_Chr442|423822:424055|35643,35645+,35646+,35648+,35650+,35652+,35653+,35654+,35655+,35656+,35658+,35660+,3565221+,35663+,35664+,35665+,35667+,35669+,35670+,35672+,35673+,35675+,35676+,35677+,35680+,35681+,3568442+,35685+,35687+,35688+,35689+,35691+,35692+,35694+,35696+,35697+,35699+,35701+,35703+,35704+,35876505+,35706+,35708+,35709+,35711+,35713+,35714+,35716+,35717
>105352635.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0-Tazoimn-0_Chr3|429677:431223|36081,36212+,36215+,36216+,36260+,36261+,36263+,36264+,36266+,36267+,36269+,36270+,36272+,36252573+,36275+,36276+,36278+,36279
>13722408.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0-100225615_Chr1|537440:537225550|50497,50498+,50500+,50501+,50222503+,50504+,50505+,50507+,50508+,50510+,50515
>14042484.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0-9905_Chr45|568425:56822533|53080,53342+,53347+,53350+,53351+,53353+,53354+,53355+,53357+,53359+,53361+,53362+,53365+,53352567+,53369+,53370+,53372+,53373+,53374+,53375+,53378+,53379+,53380+,53382+,53390+,53391+,53393+,53322494+,53397+,53423401+,53222402+,53444203+,53404+,53405+,53408+,53409
>16234445566.0.0.0.0.0.0.0.0.0.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0-10024_Chr111|648475:649290|62018,62021+,62022+,62027+,62029+,62030+,62032+,62033+,62035+,62036+,62038+,62039+,62040+,644043+,62044+,62046+,62047+,62050+,62051+,62052+,62071+,62073+,62074+,62076+,62078+,62079+,62080+,620312+,62084+,62086+,62087+,62088+,62089+,62090+,62091+,62093+,62094+,62095+,62096+,62098+,62099+,621111+,62103+,62104+,62105+,62107+,62108+,62110+,62112+,62113+,62114+,62117+,62119+,62120+,62121+,62123+,62125+,62127+,62128+,62129+,62132+,62133+,62135+,62136+,62139+,62140+,62141+,62142+,62144+,62146+,62147+,62148+,62149+,62151+,62152+,62154+,62155+,62157+,62159+,62160+,62161+,62163+,62165+,62166+,62168+,62169+,62171+,62172+,62173+,62174+,62177+,62178+,62179+,62182+,62188+,62189+,62191+,62193+,62194+,62195+,62196+,62197+,62198+,62199+,62201+,62202+,62204+,62205+,62207+,62208+,62210+,62213+,62214+,62215+,62216+,62217+,62219+,62220+,62221+,62223+,62224+,62225+,62227+,62381+,62383+,62384+,62386+,62387+,62389+,62390+,62391+,62392+,62394+,62395+,62396+,62398+,62400+,62401+,62403+,62479+,62480+,62482+,62483+,62485+,62486+,62487+,62488+,62490+,62491+,62493+,62494+,62496+,62499+,62500+,62502+,62503+,62505+,62506+,62507+,62509+,62511+,62513+,62514+,62516+,62517+,62519+,62520+,62521+,62522+,62524+,62526+,62527+,62528+,62529+,62530+,62534+,62535+,62536+,62537+,62539+,62541+,62542+,62543+,62544+,62547+,62548+,62549+,62550+,62551+,62553+,62555+,62556+,62558+,62560+,62561+,62562+,62563+,62564+,62565+,62567+,62568+,62569+,62571+,62572+,62574+,62575+,62577+,62578+,62580+,62582+,62583+,62584+,62586+,62588+,62589+,62591+,62592+,61770

The qheader bypass seems to issue the same error :-(

milot-mirdita commented 3 years ago

Can you send me the whole query file? I tried to reproduce the crash with a fake 1 gigabyte large header, but just the size doesn't seem to be the problem.

HaimAshk commented 3 years ago

Thanks! just sent you by email (@mpibpc) link for download the query file (it is ~1Gb size) Thanks for all the help, please let me know if / how I can help more with this :-)

milot-mirdita commented 3 years ago

Okay this is indeed a size problem. In my test, individual database entries are quickly ballooning to multiple gigabytes. I guess we should truncate the query field to like at most 1024 bytes or something like that.

You can either manipulate the header database to have a whitespace character somewhere near the beginning (the query field will show everything up to the first whitespace), with something like that

mmseqs apply /tmp//2989869989197200687/query_h /tmp//2989869989197200687/query_h_new --threads 1 -- cut -d'.' -f1
mv -f /tmp//2989869989197200687/query_h_new /tmp//2989869989197200687/query_h
mv -f /tmp//2989869989197200687/query_h_new.index /tmp//2989869989197200687/query_h.index

Or you can drop the query field and add the dbkey instead:

mmseqs convertalis /tmp//2989869989197200687/query /db/nt_MMSeq2_Jan2021/nt_MMSeq2_Jan2021 /tmp//2989869989197200687/result /tmp/rep_vs_NT_Jan2021.mmseq2.m8db --format-output target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits,qlen  --db-output 1  --search-type 3 
mmseqs prefixid /tmp/rep_vs_NT_Jan2021.mmseq2.m8db /tmp/rep_vs_NT_Jan2021.mmseq2.m8 --tsv

Combined with the query.lookup you can still map each of the database keys to a header.