soedinglab / MMseqs2

MMseqs2: ultra fast and sensitive search and clustering suite
https://mmseqs.com
GNU General Public License v3.0
1.37k stars 191 forks source link

Unintuitive behaviour of spaced seeds in MMseqs2 search #266

Closed tischulz1 closed 4 years ago

tischulz1 commented 4 years ago

Dear MMseqs2 team,

I got some wired results which I could not explain by myself. I hope you can help me with it.

Expected Behavior

I was expecting MMseqs2 to be more sensitive if using default options (spaced-kmer-mode enabled and kmer-matching disabled).

Current Behavior

Using MMseqs2 search with default options (spaced-kmer-mode enabled and kmer-matching disabled), the program found less results than if disabling spaced-kmer-mode and enabling kmer-matching.

Context

I thought that MMseqs2 uses spaced seeds and no exact k-mer matching to increase the sensitivity during search. I was curious to see how many alignments are found by MMseqs2 exclusively because of this. Therefore, I performed two searches with MMseqs2 search either using spaced seeds and no exact k-mer matching or the opposite. Surprisingly, I looks like using no spaced seeds and an exact k-mer matching increases the programs sensitivity as there are more results found.

Do you have an explanation for this results?

milot-mirdita commented 4 years ago

The raw number of results is usually not a very good indicator of sensitivity. Both of these parameters affect the false positive rate:

We took a lot of care to control for false positives. Controlling for FPs is especially important to us since we also do iterative profile searches and building profiles with false positives included heavily degrades sensitivity.

If you want to find all exact matches, you could try the map workflow, which disables all FP controlling parameters.

tischulz1 commented 4 years ago

Thanks for the clarification. I will have a look on the map workflow.

milot-mirdita commented 4 years ago

No problem, please reopen the issue if any questions remain.

If you want a set of stickers (see https://twitter.com/thesteinegger/status/1201076220957315074), send me your address to milot at mirdita de.