steineggerlab / foldseek

Foldseek enables fast and sensitive comparisons of large structure sets.
https://foldseek.com
GNU General Public License v3.0
696 stars 92 forks source link

Filter criteria for output results #167

Open dabaoniu opened 11 months ago

dabaoniu commented 11 months ago

Greetings team, We used foldseek tool to do a structural comparison of a batch of data, and each structure was compared to many structures, but we encountered difficulties in the process of screening homologous structuress. We are not sure whether we should use the "prob" parameter for filtering. If so, what threshold should be selected? If tmscore is selected as the screening criterion for homologous structures, which one should I trust more, "alntmscore", "qtmscore" and "ttmscore"? The same question is, what should be the selection criterion of threshold value? Also, I wanted to express my gratitude for offering such an excellent tool,thank you.

milot-mirdita commented 11 months ago

I would recommend to use our E-values. Everything below the E-value threshold of 0.01 is likely homologous. You can also use a more stringent E-value cut-off of 0.001. The prob value we compute, was calibrated on SCOP to give a better estimate for high E-value hits (usually in between 0.01 and 10).

I would recommend to use the various TM-score normalizations to assess individual hits and not for long lists of potential homologs.