interpretation of 'fident' output

steineggerlab / foldseek

Foldseek enables fast and sensitive comparisons of large structure sets.

https://foldseek.com

GNU General Public License v3.0

696 stars 92 forks source link

interpretation of 'fident' output #300

Open LLehner opened 3 days ago

LLehner commented 3 days ago

Hello, thank you for this great tool.

Currently we are trying to redo the HFSP curve using foldseek instead of mmseqs2. When using identical data we noticed a shift downwards in % sequence identity using foldseek compared to mmseqs2.

My question is: Is 'fident' reported by foldseek based on the AA residues in protein sequences (like mmseqs2) or is it based on the new structural alphabet of 3Di states?

Edit: we noticed a downwards shift in foldseek, not upwards, in terms of % sequence identity

martin-steinegger commented 3 days ago

It is the sequence identity of the amino acids sequence based on the structural alignment.

LLehner commented 3 days ago

Thank you for your quick response and clarification!

Just one more question:

With foldseek we get many protein pairs below 25% sequence identity, while with mmseqs2 there are barely any below 25% sequence identity. Since we use the exact same dataset (and identical parameters where possible), could this mean foldseek is better at detecting pairs of evolutionary distant (diverged) proteins, where just some structurally relevant domains are conserved? The proteins in question have an ungapped alignment length of ~50-500.