steineggerlab / foldseek

Foldseek enables fast and sensitive comparisons of large structure sets.
https://foldseek.com
GNU General Public License v3.0
693 stars 91 forks source link

How to conduct clusters based on global structures instead of local substructures? #265

Open krinoz opened 2 months ago

krinoz commented 2 months ago

Thanks for your work. However, as I tried to conduct protein structure clustering, the program always returns clusters based on proteins' substructure. I am wondering how can I get the cluster based on global structures?

--alignment-type 1 seems can get the global results based on TM align. Here is the code that I used: ./foldseek/bin/foldseek easy-cluster ./protein res tmp --lddt-threshold 0.5 --c 1 --alignment-type 1

However, the results I get look like:

1ai6.pdb_B 1ai6.pdb_B 1ai6.pdb_B 1ai7.pdb_B 1ai6.pdb_B 1ajn.pdb_B 1ai6.pdb_B 1ajp.pdb_B 1ai6.pdb_B 1ajq.pdb_B 1ai7.pdb_A 1ai7.pdb_A 1ai7.pdb_A 1ai6.pdb_A 1ai7.pdb_A 1ajn.pdb_A 1ai7.pdb_A 1ajp.pdb_A 1ai7.pdb_A 1ajq.pdb_A

And I would like to get the format of the clusters similar to:

1ai6.pdb 1ai6.pdb 1ai6.pdb 1ajn.pdb 1ai6.pdb 1ajq.pdb 1ai7.pdb 1ai7.pdb 1ai7.pdb 1ajp.pdb

Thanks.

milot-mirdita commented 1 month ago

We are working on a new method that covers multimer clustering. It's not quite ready yet, but we should have something out soon.

sirius777coder commented 1 week ago

Can we just select the number of chains to increase the cluster efficiency?