steineggerlab / foldseek

Foldseek enables fast and sensitive comparisons of large structure sets.
https://foldseek.com
GNU General Public License v3.0
693 stars 91 forks source link

Getting E-values for Foldseek Cluster #282

Open Loganz97 opened 3 weeks ago

Loganz97 commented 3 weeks ago

Is it possible to get e-values out of foldseek cluster like what is seen in the AFDB cluster all-vs-all dataset? From easy-cluster I get: A0A0H5AIG4 A0A0H5AIG4 A0A0H5AIG4 A0A1I5TD38 A0A0H5AIG4 A0A060LX85 A0A0H5AIG4 A0A2K9EJ19 A0A0H5AIG4 D6Z3A1

and I want:

A0A6V7DUT0 A0A2P5K2A9 0.0006679 A0A6V7DUT0 A0A1I3RKG2 0.001279 A0A6V7DUT0 A0A7J0AR87 0.002053 A0A6V7DUT0 A0A7W1LZ17 0.002451 A0A6V7DUT0 A0A353GUE2 0.004426

From what I am seeing in the docs, it appears there is no --format-output options for easy-cluster.

Thanks in advance for your help!

Best, Logan

milot-mirdita commented 3 weeks ago

easy-cluster doesn't support this.

You have to call cluster and then align manually:

foldseek createdb folder-to-pdbs/ input_db
foldseek cluster input_db clu tmp
foldseek align input_db input_db clu aln -a
foldseek convertalis input_db input_db aln aln.m8