soedinglab / MMseqs2

MMseqs2: ultra fast and sensitive search and clustering suite
https://mmseqs.com
GNU General Public License v3.0
1.4k stars 194 forks source link

Better handling of wrong input #41

Closed unode closed 7 years ago

unode commented 7 years ago

Expected Behavior

Shouldn't segfault. Ideally it should present the user with an error and some guidance on what to do.

Current Behavior

Segfaults during prefilter stage due to:

Query database: /share/input.fasta(size=0)

Steps to Reproduce (for bugs)

MMseqs Output (for bugs)

Program call:
input.fasta db test tmp 

MMseqs Version:                     7947b0035eef9ba41b64b0c752b0432465aaeb7c
Sub Matrix                          blosum62.out
Add backtrace                       false
Alignment mode                      0
E-value threshold                   0.001
Seq. Id Threshold                   0
Coverage threshold                  0
Coverage Mode                       0
Max. sequence length                32000
Max. results per query              300
Compositional bias                  1
Query queryProfile                  false
Realign hit                         false
Max Reject                          2147483647
Max Accept                          2147483647
Include identical Seq. Id.          false
No preload                          false
Early exit                          false
Threads                             40
Verbosity                           3
Sensitivity                         5.7
K-mer size                          0
K-score                             2147483647
Alphabet size                       21
Target queryProfile                 false
Offset result                       0
Split DB                            0
Split mode                          2
Diagonal Scoring                    1
Mask Residues                       1
Minimum Diagonal score              15
Spaced Kmer                         1
Profile e-value threshold           0.001
Use global sequence weighting       false
Filter MSA                          1
Maximum sequence identity threshold 0.9
Minimum seq. id.                    0
Minimum score per column            -20
Minimum coverage                    0
Select n most diverse seqs          1000
Pseudo count a                      1
Pseudo count b                      1.5
Omit Consensus                      false
Number search iterations            1
Start sensitivity                   4
sensitivity step size               1
Sets the MPI runner                 
Remove Temporary Files              false

Program call:
/share/input.fasta db /share/tmp/pref_5 --sub-mat blosum62.out -k 0 --k-score 2147483647 --alph-size 21 --max-seq-len 32000 --max-seqs 300 --offset-result 0 --split 0 --split-mode 2 -c 0 --cov-mode 0 --comp-bias-corr 1 --diag-score 1 --mask 1 --min-ungapped-score 15 --spaced-kmer-mode 1 --threads 40 -v 3 -s 5 

MMseqs Version:             7947b0035eef9ba41b64b0c752b0432465aaeb7c
Sub Matrix                  blosum62.out
Sensitivity                 5
K-mer size                  0
K-score                     2147483647
Alphabet size               21
Max. sequence length        32000
Query queryProfile          false
Target queryProfile         false
Max. results per query      300
Offset result               0
Split DB                    0
Split mode                  2
Coverage threshold          0
Coverage Mode               0
Compositional bias          1
Diagonal Scoring            1
Mask Residues               1
Minimum Diagonal score      15
Include identical Seq. Id.  false
Spaced Kmer                 1
No preload                  false
Early exit                  false
Threads                     40
Verbosity                   3

Initialising data structures...
Using 40 threads.
Could not find precomputed index. Compute index.
Use kmer size 6 and split 1 using Target split mode.
Needed memory (1374076390 byte) of total memory (270920568832 byte)
Target database: db(Size: 1)
Substitution matrices...
Time for init: 0 h 0 m 0s

Query database: /share/input.fasta(size=0)
Process prefiltering step 1 of 1

Index table: counting k-mers...

Index table: Masked residues: 188
Index table: fill...
Index table: removing duplicate entries...
Index table init done.

DB statistic
Entries:         16342
DB Size:         686227020 (byte)
Avg Kmer Size:   0.000190541
Top 10 Kmers
    RHCCAA      1
    QCICAA      1
    WSQFAA      1
    WQPHAA      1
    HPKLAA      1
    GHLLAA      1
    WRPNAA      1
    PHCQAA      1
    HRCQAA      1
    FHNQAA      1
Min Kmer Size:   0
Empty list: 85749779

Time for index table init: 0 h 0 m 1s

k-mer similarity threshold: 95
k-mer match probability: 0

tmp/blastp.sh: line 77: 32467 Segmentation fault      $RUNNER $MMSEQS prefilter "$INPUT" "$TARGET_DB_PREF" "$TMP_PATH/pref_$SENS" $PREFILTER_PAR -s $SENS
Error: Prefilter died

Issue

This was due to lack of RTFM but in general a segfault is not a good way to say goodbye :)

The source of the problem may have been:

Query database: /share/input.fasta(size=0)
                                        ^
martin-steinegger commented 7 years ago

Thanks a lot for reporting this bug. Commit https://github.com/soedinglab/MMseqs2/commit/138ebea099c33a2fbc3c282235b1c839bae1fb43 should now print an error message.

unode commented 7 years ago

Thanks