Closed ms-gx closed 3 years ago
Problem solved. I got more disk space and did not work with compressed database anymore. And voilà -- it works!
So the compression made problems in my case.
Compression is a good hint for where to look for the problem. I'll see if I can track down whats wrong in the next few days. Thanks.
I'm getting a similar segemntation fault with a tblastn-style search against a taxonomy-annotated target database derived from BLAST NT. Interestingly, it looks like the prefilter step calculates the memory consumption at 60T but jumps right into prefiltering instead of splitting the database to handle the ~620G memory limit. I also used the --compressed flag, but will check to see if removing that flag fixes the problem for me too.
@milot-mirdita It may be worth re-opening this issue.
search query_db/db target_db/db result_db/db /fsx/scratch/mmseqs/mmseqs-nf/d3d8e6be-a51b-4707-b105-d650f029c7be/MMSEQS/BLAST_DB_SEARCH/mmseqs_search -s 6 -a --num-iterations 1 --use-all-table-starts 1 --compressed 1 --split-memory-limit 618475290624 --threads 96
MMseqs Version: 45111b641859ed0ddd875b94d6fd1aef1a675b7e
Substitution matrix nucl:nucleotide.out,aa:blosum62.out
Add backtrace true
Alignment mode 2
Alignment mode 0
Allow wrapped scoring false
E-value threshold 0.001
Seq. id. threshold 0
Min alignment length 0
Seq. id. mode 0
Alternative alignments 0
Coverage threshold 0
Coverage mode 0
Max sequence length 65535
Compositional bias 1
Max reject 2147483647
Max accept 2147483647
Include identical seq. id. false
Preload mode 0
Pseudo count a 1
Pseudo count b 1.5
Score bias 0
Realign hits false
Realign score bias -0.2
Realign max seqs 2147483647
Gap open cost nucl:5,aa:11
Gap extension cost nucl:2,aa:1
Zdrop 40
Threads 96
Compressed 1
Verbosity 3
Seed substitution matrix nucl:nucleotide.out,aa:VTML80.out
Sensitivity 6
k-mer length 0
k-score 2147483647
Alphabet size nucl:5,aa:21
Max results per query 300
Split database 0
Split mode 2
Split memory limit 589824T
Diagonal scoring true
Exact k-mer matching 0
Mask residues 1
Mask lower case residues 0
Minimum diagonal score 15
Spaced k-mers 1
Spaced k-mer pattern
Local temporary path
Rescore mode 0
Remove hits by seq. id. and coverage false
Sort results 0
Mask profile 1
Profile E-value threshold 0.1
Global sequence weighting false
Allow deletions false
Filter MSA 1
Maximum seq. id. threshold 0.9
Minimum seq. id. 0
Minimum score per column -20
Minimum coverage 0
Select N most diverse seqs 1000
Min codons in orf 30
Max codons in length 32734
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 1
Forward frames 1,2,3
Reverse frames 1,2,3
Translation table 1
Translate orf 0
Use all table starts true
Offset of numeric ids 0
Create lookup 0
Add orf stop false
Overlap between sequences 0
Sequence split mode 1
Header split mode 0
Chain overlapping alignments 0
Merge query 1
Search type 0
Search iterations 1
Start sensitivity 4
Search steps 1
Exhaustive search mode false
Filter results during exhaustive search 0
Strand selection 1
LCA search mode false
Disk space limit 0
MPI runner
Force restart with latest tmp false
Remove temporary files false
prefilter query_db/db /fsx/scratch/mmseqs/mmseqs-nf/d3d8e6be-a51b-4707-b105-d650f029c7be/MMSEQS/BLAST_DB_SEARCH/mmseqs_search/340477856621524793/t_orfs_aa /fsx/scratch/mmseqs/mmseqs-nf/d3d8e6be-a51b-4707-b105-d650f029c7be/MMSEQS/BLAST_DB_SEARCH/mmseqs_search/340477856621524793/search/pref_0 --sub-mat nucl:nucleotide.out,aa:blosum62.out --seed-sub-mat nucl:nucleotide.out,aa:VTML80.out -k 0 --k-score 2147483647 --alph-size nucl:5,aa:21 --max-seq-len 65535 --max-seqs 300 --split 0 --split-mode 2 --split-memory-limit 589824T -c 0 --cov-mode 0 --comp-bias-corr 1 --diag-score 1 --exact-kmer-matching 0 --mask 1 --mask-lower-case 0 --min-ungapped-score 15 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 0 --pca 1 --pcb 1.5 --threads 96 --compressed 1 -v 3 -s 6.0
Query database size: 727664 type: Aminoacid
Estimated memory consumption: 60T
Target database size: 13319670203 type: Aminoacid
Index table k-mer threshold: 118 at k-mer size 7
Index table: counting k-mers
Error: Prefilter died
Error: Search step died
Expected Behavior
I would like to query a transcriptome against NT db and retrieve taxonomy. I generated the NT db according to your docs (with compression enabled). Then I convert my transcriptome to a mmseqs2 db and try to query via:
mmseqs taxonomy --search-type 3 Transcripts_mmseqs2 nt.fnaDB MyTaxonomyResult tmp
But I get a segfault...
UPDATE: I also get a segfault when executing
search
ortaxonomy
against a pre-compiled database downloaded viadatabases
. UPDATE 2: Also happens with the latest Docker image. UPDATE 3: Tried a very small toy fasta. Also segfaults.Current Behavior
Execution of
mmseqs taxonomy
fails with segfault.It tried several versions of mmseqs2 binary:
-> All fail
Steps to Reproduce (for bugs)
Create DB for query:
mmseqs createdb ../transcripts.fasta Transcripts_mmseqs2
Get taxonomy:mmseqs taxonomy --search-type 3 Transcripts_mmseqs2 nt.fnaDB MyTaxonomyResult tmp
These are the files I generated from NT as the target database (does anything look off?):
MMseqs Output (some paths & filenames redacted)
Environment