soedinglab / MMseqs2

MMseqs2: ultra fast and sensitive search and clustering suite
https://mmseqs.com
MIT License
1.47k stars 200 forks source link

Choice of linclust parameters (documentation) #245

Closed Greblica closed 5 years ago

Greblica commented 5 years ago

I have a question regarding the choice of linclust parameters in the Steinegger et al. 2019 paper.

These are the parameters:

--kmer-per-seq 80 --cluster-mode 2 --cov-mode 1 -c 0.9 --min-seq-id from 50% to 90%

My question concerns the part in bold:

if I understand well, it means that the query must cover at least 90% of the target to be listed in the cluster, is that so?

If that is the case, could you explain to me the rationale behind this? Somehow, to me seems more intuitive that it is the query which should kind of fit into the target. Even if it's much shorter, but still similar, I would expect it to be in the same cluster (or it shouldn't be?).

Thanks a lot for your clarification,

G

Greblica commented 5 years ago

OK, I realized this does exactly what it should according to me :). I was just confused by the use of words "target" and "query", if target is cluster member and query cluster representative, it makes perfect sense.

martin-steinegger commented 5 years ago

Yes, this exactly the definition. I should write this more clearly in the documentation.