Open hwy7 opened 1 year ago
First: both the sensitivity parameter and the iteration parameter do not do anything for nucleotide MMseqs2 searches. sensitivity is the parameter for adjusting the length of the similar k-mer lists, which are not generated for nucleotides (all substitutions have the same score, so you can't generate similar k-mers).
Profile searches are also not implemented for nucleotides.
However, the error is still very surprising and should not happen. Could you share the sequences with us?
Thank you for your reply my target sequences are some CDS sequences download from NCBI, and query sequences are some sequences fragment of 300bp here are some partial sequences of the target and query file. https://gist.github.com/hwy7/cd5486d2a61c3b6bfe990a3ada669318 Please let me know if you need any more information or if there are specific analyses you would like to perform with this data. Thanks
Expected Behavior
Successful create a search resultDB when run
mmseqs search query/queryDB target/tragetDB search/resultDB -s 7.5 --search-type 3
but fail when runmmseqs search query/queryDB target/tragetDB search/resultDB -s 7.5 --search-type 3 --num-iterations 2
Current Behavior
Error: Alignment died Error: Search step died
Steps to Reproduce (for bugs)
MMseqs Output (for bugs)
splitsequence sub/subDB tmp/7935334228278574252/target_seqs_split --max-seq-len 10000 --sequence-overlap 0 --sequence-split-mode 1 --headers-split-mode 0 --create-lookup 0 --threads 96 --compressed 0 -v 3
[=================================================================] 100.00% 365.60K 1s 853ms
Time for merging to target_seqs_split_h: 0h 0m 0s 83ms Time for merging to target_seqs_split: 0h 0m 0s 97ms Time for processing: 0h 0m 2s 329ms extractframes querydata/queryDB tmp/7935334228278574252/query_seqs --forward-frames 1 --reverse-frames 1 --create-lookup 0 --threads 96 --compressed 0 -v 3
[=================================================================] 100.00% 2.00K 0s 18ms
Time for merging to query_seqs_h: 0h 0m 0s 62ms Time for merging to query_seqs: 0h 0m 0s 6ms Time for processing: 0h 0m 0s 213ms splitsequence tmp/7935334228278574252/query_seqs tmp/7935334228278574252/query_seqs_split --max-seq-len 10000 --sequence-overlap 0 --sequence-split-mode 1 --headers-split-mode 0 --create-lookup 0 --threads 96 --compressed 0 -v 3
Time for processing: 0h 0m 0s 0ms prefilter tmp/7935334228278574252/query_seqs_split tmp/7935334228278574252/target_seqs_split tmp/7935334228278574252/search/pref_0 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --seed-sub-mat 'aa:VTML80.out,nucl:nucleotide.out' -s 7.5 -k 15 --target-search-mode 0 --k-score seq:2147483647,prof:2147483647 --alph-size aa:21,nucl:5 --max-seq-len 10000 --max-seqs 300 --split 0 --split-mode 2 --split-memory-limit 0 -c 0 --cov-mode 2 --comp-bias-corr 1 --comp-bias-corr-scale 1 --diag-score 1 --exact-kmer-matching 1 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --min-ungapped-score 15 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --threads 96 --compressed 0 -v 3
Query database size: 4000 type: Nucleotide Estimated memory consumption: 11G Target database size: 365688 type: Nucleotide Index table k-mer threshold: 0 at k-mer size 15 Index table: counting k-mers [=================================================================] 100.00% 365.69K 16s 177ms
Index table: Masked residues: 1079896 Index table: fill [=================================================================] 100.00% 365.69K 12s 498ms
Index statistics Entries: 297952985 DB size: 9896 MB Avg k-mer size: 0.277490 Top 10 k-mers GGCGCAGCGCGGTGC 366 TCCGGGCCGCACGGT 330 GTCGCGGCAGCGCCG 209 CAGACGCGCGTGCCG 204 CGCGCGCGTCGCGCG 167 CGCGCGCGTGGCGCG 157 GCTGCGCGCGGCGCG 151 CGCGGGCGTGGCGCG 149 CGTGCGCGTGGCGCG 147 CGCGCGCCCGGCGCG 133 Time for index table init: 0h 0m 39s 203ms Process prefiltering step 1 of 1
k-mer similarity threshold: 0 Starting prefiltering scores calculation (step 1 of 1) Query db start 1 to 4000 Target db start 1 to 365688 [=================================================================] 100.00% 4.00K 0s 74ms
[================================================================>] 99.72% 3.99K eta 0s
0.926667 k-mers per position 434 DB matches per sequence 0 overflows 4 sequences passed prefiltering per query sequence 1 median result list length 1762 sequences with 0 size result lists Time for merging to pref_0: 0h 0m 0s 5ms Time for processing: 0h 0m 40s 369ms align tmp/7935334228278574252/query_seqs_split tmp/7935334228278574252/target_seqs_split tmp/7935334228278574252/search/pref_0 tmp/7935334228278574252/search/aln_0 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -a 1 --alignment-mode 3 --alignment-output-mode 0 --wrapped-scoring 0 -e 0.001 --min-seq-id 0 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 0 --cov-mode 2 --max-seq-len 10000 --comp-bias-corr 1 --comp-bias-corr-scale 1 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 1 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --zdrop 40 --threads 96 --compressed 0 -v 3
Compute score only Query database size: 4000 type: Nucleotide Target database size: 365688 type: Nucleotide Calculation of alignments Query sequence 236 has a result with no diagonal information. Please check your database. Error: Alignment died Error: Search step died
Your Environment
Include as many relevant details about the environment you experienced the bug in.
df77d9e6cf640fe8990f247441ab44d4f4ad9cdc
Ubuntu 20.04.4 LTS (GNU/Linux 5.4.0-121-generic x86_64)