steineggerlab / foldseek

Foldseek enables fast and sensitive comparisons of large structure sets.
https://foldseek.com
GNU General Public License v3.0
772 stars 100 forks source link

Cluster purity analysis with structurealign? #111

Open BinhongLiu opened 1 year ago

BinhongLiu commented 1 year ago

Expected Behavior

Hi, I found the cluster purity analysis using structurealign here (https://www.biorxiv.org/content/10.1101/2023.03.09.531927v1). The representative structure was aligned to the cluster members using the "structurealign -e INF -a" module in Foldseek to calculate the average LDDT and average TM-score per cluster. Could you provide a more detailed guide about this? I'm not sure if I need to complete the analysis with a loop script.

martin-steinegger commented 1 year ago

We calculated the TM-scores and LDDT scores using 3Di/AA structural alignments (structurealign). To obtain the TM-score and average LDDT score for the alignments we used convertalis modul.

BinhongLiu commented 1 year ago

Should this analysis be completed in the clusters one by one using a loop script?

PawelSzczerbiak commented 7 months ago

Could you please provide more details on that topic? Eg. how did you generate prefilterdb comprising all query-target alignments which is a required input for structurealign?

EDIT: isn't it so that all that steps can be done using just one command easy-search with --exhaustive-search 1?

martin-steinegger commented 7 months ago

We do have the scripts how to compute the purity per cluster here: https://github.com/steineggerlab/afdb-clusters-analysis