soedinglab / MMseqs2

MMseqs2: ultra fast and sensitive search and clustering suite
https://mmseqs.com
GNU General Public License v3.0
1.4k stars 194 forks source link

blastp.sh: line 86: Segmentation fault -> "PreFilter died" #55

Closed Kouroshb26 closed 6 years ago

Kouroshb26 commented 6 years ago

Expected Behavior

To analyze the data through the prefiltering step. As expected, the example data set works correctly using the program.

Current Behavior

There seems to be a problem with running the prefiltring system. The issue seems very similar to issue in #52

Steps to Reproduce (for bugs)

Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.

mmseqs createdb database.fasta DB mmseqs createdb query.fasta QUERY mmseqs search QUERY DB RESULT tmp

files: Note that the extention of these files has been changed so they can be uploaded to github

database.txt query.txt

MMseqs Output (for bugs)

Please make sure to also post the complete output of MMseqs. You can use gist.github.com for large output.

-bash-4.2$ mmseqs createdb database.fasta DB Program call: database.fasta DB

MMseqs Version: 2c5dcabb805a4bd6d2db64095782a211aa1153fe Max. sequence length 32000 Split Seq. by len true Offset of numeric ids 0 Verbosity 3

Time for merging files: 0 h 0 m 0 s Time for merging files: 0 h 0 m 0 s -bash-4.2$ mmseqs createdb query.fasta QUERY Program call: query.fasta QUERY

MMseqs Version: 2c5dcabb805a4bd6d2db64095782a211aa1153fe Max. sequence length 32000 Split Seq. by len true Offset of numeric ids 0 Verbosity 3

....Time for merging files: 0 h 0 m 0 s Time for merging files: 0 h 0 m 0 s -bash-4.2$ mmseqs search QUERY DB RESULT tmp Program call: QUERY DB RESULT tmp

MMseqs Version: 2c5dcabb805a4bd6d2db64095782a211aa1153fe Sub Matrix blosum62.out Add backtrace false Alignment mode 2 E-value threshold 0.001 Seq. Id Threshold 0 Coverage threshold 0 Coverage Mode 0 Max. sequence length 32000 Max. results per query 300 Compositional bias 1 Realign hit false Max Reject 2147483647 Max Accept 2147483647 Include identical Seq. Id. false No preload false Early exit false Pseudo count a 1 Pseudo count b 1.5 Threads 32 Verbosity 3 Sensitivity 5.7 K-mer size 0 K-score 2147483647 Alphabet size 21 Offset result 0 Split DB 0 Split mode 2 Diagonal Scoring 1 Mask Residues 1 Minimum Diagonal score 15 Spaced Kmer 1 Profile e-value threshold 0.001 Use global sequence weighting false Filter MSA 1 Maximum sequence identity threshold 0.9 Minimum seq. id. 0 Minimum score per column -20 Minimum coverage 0 Select n most diverse seqs 1000 Omit Consensus false Min codons in orf 30 Max codons in length 98202 Max orf gaps 2147483647 Skip incomplete orfs false Find longest orf true Extend short orfs false Forward Frames 1,2,3 Reverse Frames 1,2,3 Offset of numeric ids 0 Translation Table 1 Number search iterations 1 Start sensitivity 4 Search steps 1 Sets the MPI runner Remove Temporary Files false

Tmp tmp folder does not exist or is not a directory. Created dir tmp Program call: /home/banaeiak/thesis/Laing/MMseqs2ResultsCelegans/QUERY /home/banaeiak/thesis/L aing/MMseqs2ResultsCelegans/DB /home/banaeiak/thesis/Laing/MMseqs2ResultsCelegan s/tmp/5905259317257326532/pref_5.7 --sub-mat blosum62.out -k 0 --k-score 2147483 647 --alph-size 21 --max-seq-len 32000 --max-seqs 300 --offset-result 0 --split 0 --split-mode 2 -c 0 --cov-mode 0 --comp-bias-corr 1 --diag-score 1 --mask 1 -- min-ungapped-score 15 --spaced-kmer-mode 1 --pca 1 --pcb 1.5 --threads 32 -v 3 - s 5.7

MMseqs Version: 2c5dcabb805a4bd6d2db64095782a211aa1153fe Sub Matrix blosum62.out Sensitivity 5.7 K-mer size 0 K-score 2147483647 Alphabet size 21 Max. sequence length 32000 Max. results per query 300 Offset result 0 Split DB 0 Split mode 2 Coverage threshold 0 Coverage Mode 0 Compositional bias 1 Diagonal Scoring 1 Mask Residues 1 Minimum Diagonal score 15 Include identical Seq. Id. false Spaced Kmer 1 No preload false Early exit false Pseudo count a 1 Pseudo count b 1.5 Threads 32 Verbosity 3

Initialising data structures... Using 32 threads. Could not find precomputed index. Compute index. Use kmer size 6 and split 1 using Target split mode. Needed memory (5736197 byte) of total memory (67263340544 byte) Target database: /home/banaeiak/thesis/Laing/MMseqs2ResultsCelegans/DB(Size: 982 ) Query database type: Nucleotide Target database type: Aminoacid Time for init: 0 h 0 m 0s

Query database: /home/banaeiak/thesis/Laing/MMseqs2ResultsCelegans/QUERY(size=49 535) Process prefiltering step 1 of 1

Index table: counting k-mers...

Index table: Masked residues: 0 Index table: fill... Index table: removing duplicate entries... Index table init done.

DB statistic Entries: 195 DB Size: 33938 (byte) Avg Kmer Size: 0.0476074 Top 10 Kmers GGGGGG 13 AAAAGA 4 TTATTT 4 GGGAGA 3 GGAGGA 3 GAGGGG 3 GAGAAT 3 ATAGGA 2 GGGGGA 2 AAAAAG 2 Min Kmer Size: 0 Empty list: 3950

Time for index table init: 0 h 0 m 0s

k-mer similarity threshold: 88 k-mer match probability: 0

Starting prefiltering scores calculation (step 1 of 1) Query db start 1 to 49535 Target db start 1 to 982 tmp/5905259317257326532/blastp.sh: line 86: 186247 Segmentation fault (core dumped) $RUNNER $MMSEQS prefilter "$INPUT" "$TARGET" "$TMPPATH/pref$SENS" $PR EFILTER_PAR -s $SENS Error: Prefilter died

Context

Providing context helps us come up with a solution and improve our documentation for the future.

I am thinking there is a problem with the fasta files or something as the examples ones provided with the code work fine.

lastly I was wondering if there is any parameters that can be used to limit the number of cores used.

Your Environment

Include as many relevant details about the environment you experienced the bug in.

MMseqs2 Version: 2c5dcabb805a4bd6d2db64095782a211aa1153fe The MMseqs version that was used was the self-compiled one Cmake version is 3.10.0 Support both AVX2/SSE Running Linux

milot-mirdita commented 6 years ago

Thank you for your bug report, I reproduced the issue and also know whats going wrong. We will discuss it later today and fix it.

Kouroshb26 commented 6 years ago

Hi,

I have not heard from you guys for two days now. Is there any sort of update.

Thanks

martin-steinegger commented 6 years ago

@milot-mirdita can you push your changes?

milot-mirdita commented 6 years ago

Should be fixed in ce3e98e. Please try again. Sorry the weekend got into the way of me finishing the fix.

genomewalker commented 6 years ago

Hi, I think the following error is related to this issue as well. When searching against Uniref90 on 31e1fddc9b9368570bb39be3051232d4e64f7ae9 I am getting the following error:

*** Error in `*** Error in `mmseqs*** 
Error in `mmseqs': free(): invalid pointer: 0x0000000000ccb6d8 ***
*** Error in `tmp8/4822298374491924264/blastp.sh: line 86:  6866 Aborted                 
$RUNNER $MMSEQS prefilter "$INPUT" "$TARGET" "$TMP_PATH/pref_$SENS" $PREFILTER_PAR -s $SENS
Error: Prefilter died

Many thanks Antonio

martin-steinegger commented 6 years ago

@genomewalker the bug reported by @Kouroshb26 was a bug occuring because of an error in the translated (blastx) search. Did you also run an translated search? If not, could you please open a new ticket for this? Could you give us some more information please?

Kouroshb26 commented 6 years ago

Thank you for your support. It works like a charm now.

The last question I have is how can I set the number of cores that I can run MMseqs with if I don't have MPI installed. I believe in your documentation that is how you control how many cores MMseqs runs on.

Thanks Kourosh

milot-mirdita commented 6 years ago

For parallelization on one compute node use the --threads parameter. By default it will use as many as the system has.

MPI is only used to coordinate multiple compute nodes, if you use MPI you should make sure to use the --npernode 1 parameter of mpirun, so you do not start more than one MPI process per node.

Kouroshb26 commented 6 years ago

Thank you for the help.