steineggerlab / foldseek

Foldseek enables fast and sensitive comparisons of large structure sets.
https://foldseek.com
GNU General Public License v3.0
794 stars 100 forks source link

why easy-cluster alignment results are slightly different from DaliLite.v5 z scores? [not bug] #365

Open Huilin-Li opened 3 days ago

Huilin-Li commented 3 days ago

Expected Behavior

After foldseek easy-cluster, in each clustered group, I also calculated the z score from DaliLite.V5. In my understanding, in each clustered group, proteins are already highly similar with each other, therefore, their z scores to a query protein should also be very close.

However, I can find outliers if I plot these z scores of each group.

Current Behavior

Let's see, there are three clustered groups A,B,C and they are generated by foldseek easy-cluster with default settings. However, in the Group C, we can see lots of outliers. These outliers seemed to say, well, although they are in same clustered group, and they are already highly similar to each other in the level of structure, they still performed differently when they are aligned with one same query protein.

image

Steps to Reproduce (for bugs)

Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.

Foldssek Output (for bugs)

Please make sure to also post the complete output of Spacepharer. You can use gist.github.com for large output.

Context

Providing context helps us come up with a solution and improve our documentation for the future.

Your Environment

Include as many relevant details about the environment you experienced the bug in.

martin-steinegger commented 3 days ago

Please see answer here: https://github.com/steineggerlab/foldseek/issues/364