steineggerlab / foldseek

Foldseek enables fast and sensitive comparisons of large structure sets.
https://foldseek.com
GNU General Public License v3.0
831 stars 103 forks source link

How can I perform domain clustering by foldseeek? #231

Open WongQh opened 10 months ago

WongQh commented 10 months ago

It might be a silly question but I still don‘t understand how you perform domain clustering by foldseek. You mentioned in your nature article "Clustering predicted structures at the scale of the known protein universe"that " Clustering of the start and end positions for Foldseek hits of one protein against all others was used to define potential domain boundary positions. Each predicted domain region was linked to the others sharing structural similarities and graph-based clustering was used to define domain families and interdomain similarity. " Is it performed by foldseek or other algrithm? How can I do it using my own data?

code4luck commented 2 months ago

hello, do you have any ideals?

WongQh commented 3 weeks ago

hello, do you have any ideals?

I didn't find a solution in foldseek, but I got another idea: we can split pdb files into multiple segments by Merizo; the output segmented pdb files will be tagged as your_prot_id_01.dom.pdb/your_prot_id_0_02.dom.pdb; then we can do a eassy-cluster in these segments. It might be feasible, but I didn't try it because I think it would be troublesome to annotate the segments to pfam domains (maybe we can do a structure align with pfam seed files together).

martin-steinegger commented 3 weeks ago

That should be a much better solution overall :) Thanks for sharing.

WongQh commented 3 weeks ago

That should be a much better solution overall :) Thanks for sharing.

It would be good if there is a toolkit like hmmer, building hmm models using 3Di features but not protein sequences. 3Di would be compatible with hmmer search/scan.

martin-steinegger commented 3 weeks ago

We support profile searches if structures. So, you can build models over your domains. Check out result2profile.

WongQh commented 3 weeks ago

We support profile searches if structures. So, you can build models over your domains. Check out result2profile.

I'll try it. Thank you :D