steineggerlab / foldseek

Foldseek enables fast and sensitive comparisons of large structure sets.
https://foldseek.com
GNU General Public License v3.0
693 stars 91 forks source link

Minimum number of members in a cluster #278

Open jmoojun opened 1 month ago

jmoojun commented 1 month ago

Hi, thank you for making such a simple tool to use.

I have a question about foldseek clustering, I used the command 'foldseek cluster db clu tmp'. It would generate clusters, but some of them only have one member. I would like every cluster to have at least 10 members, is it possible to control the minimum size of every cluster? If not possible, is it possible to merge the clusters based on the distance (e.g., 10 closest singleton clusters merged together to form a 10-membered cluster)?

Thank you in advance for your support.

martin-steinegger commented 1 month ago

I think you are searching for a different type of clustering. You might want to use something like k-means clustering. We cluster and produce clusters that should fulfill the clustering criteria given.