steineggerlab / foldseek

Foldseek enables fast and sensitive comparisons of large structure sets.
https://foldseek.com
GNU General Public License v3.0
764 stars 100 forks source link

clustering of multimers #127

Open adrienchaton opened 1 year ago

adrienchaton commented 1 year ago

Hello,

I am trying to cluster pdbs with several chains, in this case they are all dimers (antibodies).

I have not found whether or not foldseek would allow that but since I could run latest foldseek version on a folder with monomers, I guess the issue I am encountering now is related to limitations of the algorithm to work on single chains only ?

foldseek createdb ./pdbs DB this one works fine but at the second step to call foldseek search then I get an error

k-mer similarity threshold: 78
Starting prefiltering scores calculation (step 1 of 1)
Query db start 1 to 2531172
Target db start 1 to 2531172
Segmentation fault (core dumped)                                  ] 0.00% 1 eta -
Error: Kmer matching step died

Is there a procedure to cluster dimers with FoldSeek please ? Since I will work with antibodies, it is fine for me to assume all the pdbs have the same number of chain = 2.

Thanks!

martin-steinegger commented 1 year ago

Currently this is not supported. We are working on this.

adrienchaton commented 1 year ago

thanks for your reply and the good job done! that will be a great addition to this tool

adrienchaton commented 1 year ago

one thought I had, could anything related to the glycine linker trick as in e.g. AF multimer, be applicable to FoldSeek?

edumenezes77 commented 1 year ago

I'm working on the same

I tried to do a clustering of antibodies based on the orientation of the VH/VL domains.

And I had to designate the same chains id of both as one (I treated H and L chains as A) and I renumbered the residues so as not to have overlapping numbers.

sirius777coder commented 2 months ago

Hi, have you tried this linker or other tools in dimer structure clustering? I have meet the same issue

adrienchaton commented 5 days ago

@martin-steinegger I saw that you released some multimer methods for foldseek, this is great!

are there plans to release some multimer workflow equivalent to easy-cluster please?

otherwise, I could think using easy-multimersearch of one database against itself to get pairwise TM-scores that can be used to create clusters ... does that sound correct or would there be a better way to do multimer clustering please?