After foldseek easy-cluster, in each clustered group, I also calculated the z score from DaliLite.V5. In my understanding, in each clustered group, proteins are already highly similar with each other, therefore, their z scores to a query protein should also be very close.
However, I can find outliers if I plot these z scores of each group.
Current Behavior
Let's see, there are three clustered groups A,B,C and they are generated by foldseek easy-cluster with default settings. However, in the Group C, we can see lots of outliers. These outliers seemed to say, well, although they are in same clustered group, and they are already highly similar to each other in the level of structure, they still performed differently when they are aligned with one same query protein.
Steps to Reproduce (for bugs)
Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.
Foldssek Output (for bugs)
Please make sure to also post the complete output of Spacepharer. You can use gist.github.com for large output.
Context
Providing context helps us come up with a solution and improve our documentation for the future.
Your Environment
Include as many relevant details about the environment you experienced the bug in.
Git commit used (The string after "MMseqs Version:" when you execute foldseek without any parameters):
Which foldseek version was used (Statically-compiled, self-compiled, Conda, etc.):
For self-compiled and Homebrew: Compiler and Cmake versions used and their invocation:
Server specifications (especially CPU support for AVX2/SSE and amount of system memory):
Expected Behavior
After
foldseek easy-cluster
, in each clustered group, I also calculated the z score from DaliLite.V5. In my understanding, in each clustered group, proteins are already highly similar with each other, therefore, their z scores to a query protein should also be very close.However, I can find outliers if I plot these z scores of each group.
Current Behavior
Let's see, there are three clustered groups A,B,C and they are generated by
foldseek easy-cluster
with default settings. However, in the Group C, we can see lots of outliers. These outliers seemed to say, well, although they are in same clustered group, and they are already highly similar to each other in the level of structure, they still performed differently when they are aligned with one same query protein.Steps to Reproduce (for bugs)
Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.
Foldssek Output (for bugs)
Please make sure to also post the complete output of Spacepharer. You can use gist.github.com for large output.
Context
Providing context helps us come up with a solution and improve our documentation for the future.
Your Environment
Include as many relevant details about the environment you experienced the bug in.