steineggerlab / foldseek

Foldseek enables fast and sensitive comparisons of large structure sets.
https://foldseek.com
GNU General Public License v3.0
780 stars 99 forks source link

negative bits score #166

Open twaksman001 opened 1 year ago

twaksman001 commented 1 year ago

I have observed negative bits score for some pairs of protein structures in an all-against-all comparison of a collection of structures. I don't understand how this is possible, because Smith-Waterman scoring and TM score and LDDT values are never negative I think. Please explain how this can happen?

martin-steinegger commented 1 year ago

We are correcting scores by subtracting a reverse score (alignment score with inverted query) as a compositional bias correction. This can result in negative scores.

twaksman001 commented 1 year ago

Thanks. I am not sure if I should post this question in GitHub issues, but I would like to know if the meaning of evalue is different when doing all-against-all comparison within a collection of structures, compared to searching a database? In that case, is database size the size of the dataset?

martin-steinegger commented 1 year ago

We always consider the database or collection size for computing the e-value.

twaksman001 commented 1 year ago

In this all-against-all comparison, I see that for the majority of protein pairs, evalue is different depending on the order of the 2 proteins in the comparison. Why does that happen?

milot-mirdita commented 1 year ago

Foldseek's bit-score is not symmetric. There is a reverse bit-score correction (see "Pairwise local structural alignments" in the paper's methods section). The E-value is also affected by that.