westlake-repl / SaProt

[ICLR'24 spotlight] Saprot: Protein Language Model with Structural Alphabet
MIT License
317 stars 32 forks source link

the results of DMS fitness in the ProteinGym benchmark #57

Open jiaolifengmi opened 3 weeks ago

jiaolifengmi commented 3 weeks ago

Thank you very much for your wonderful work. I have some questions about the verification of DMS fitness in the ProteinGym benchmark. I noticed that the "substitution DMS benchmark" in the ProteinGym benchmark contains 217 groups of proteins, but in your experimental verification, only the There are 63 groups of proteins, and only 46 groups of proteins have the same names. Can you provide us with how your ProteinGym data set is constructed and why it is so different from the original proteinGym?

LTEnjoy commented 3 weeks ago

Hi, thank you for you interest in our work!

The ProteinGym benchmark is actually its 1.0 version. The ProteinGym dataset used in our paper is the 0.1 version, which was mentioned in https://arxiv.org/abs/2205.13760. At the time this paper was submitted, such 1.0 version was not published yet, so we only tested on the older version.