steineggerlab / foldseek

Foldseek enables fast and sensitive comparisons of large structure sets.
https://foldseek.com
GNU General Public License v3.0
806 stars 100 forks source link

Why the evalues of TMalign with FoldSeek are all so large? #323

Open TJiangBio opened 2 months ago

TJiangBio commented 2 months ago

Expected Behavior

The e-value thresholds in publications are usually between e10−4and e10−16. Even in the Foldseek study on clustering the protein universe (https://www.nature.com/articles/s41586-023-06510-w).

Problem

However, why are the e-values of TMscore calculated by Foldseek using the foldseek easy-search function so large? The range of mine is (0,1), which is really confusing.

Looking forward to your reply, thanks. image

milot-mirdita commented 2 months ago

The output in the TM mode is just plainly confusing, sorry about that.

We use the E-value column in TM-align mode to report the TM-score of the Alignment. We don't report E-values in TM-align mode.

TJiangBio commented 2 months ago

Thank you for your prompt response!

For TMscore, should I use the "evalue" or the "alntmscore" column?

Additionally, how can I obtain the true E-values in the TM-align mode? Is it possible to get them using Foldseek, or do I need to use a different tool? Could you please provide more suggestions?

Looking forward to your reply. image image