Closed richardshuai closed 1 year ago
Could you please provide more specific details regarding the application you have in mind for clustering?
One thing that come to my mind is to use TM-score. We have incorporated a feature to allow clustering based on the TM-score. You might want to explore the --tmscore-threshold
option for your needs.
Thank you! It does seem like clustering based on TM-score gives me results closer to what I want. I'm trying to cluster antibody structures (so they will be highly similar except in their hypervariable CDRs), so I wanted a way to cluster such that antibodies with similar CDR orientations will be placed in the same cluster.
Is there an easy interpretation for the tmscore-threshold as far as what it means for each individual cluster? Also, are the other options such as --min-seq-id / -c being used in the Foldseek-TM mode?
Interesting, I never tried to cluster highly similar structures. Yes, you can combine cluster criteria. Increasing the coverage does make sense for your use-case.
I'm trying to cluster a dataset of thousands of highly similar protein structures using foldseek cluster, but I'm finding that foldseek gives me very few clusters (maybe 30-40) even with very strict structural alignment cutoffs (-c 0.999, -e 0.001). However, the structures within a given cluster do still look different when viewed in PyMOL. How would I go about further increasing cluster stringency these cutoffs?