microsoft / Graphormer

Graphormer is a general-purpose deep learning backbone for molecular modeling.
MIT License
2k stars 324 forks source link

Feature representations for new Proteins in DiG #184

Open sai-advaith opened 2 months ago

sai-advaith commented 2 months ago

Hi,

This is regarding protein generation in DiG.

I wanted to know how you obtained the features present in the protein pickle files. As per Appendix B.1 of the paper, the single and pair representations are simply outputs of a pre-trained Evoformer model from AlphaFold given the corresponding protein's Fasta sequence and MSAs.

I set up OpenFold on our systems and saved the representations from Evoformer in a pickle file for the corresponding protein. I used the single and pair keys in the output dictionary in this link. Also, to get the MSAs for the fasta sequence I queried the ColabFold server.

Unfortunately, the representations I received from OpenFold's Evoformer and the representations in the dataset's pickle file were quite different.

Can you please let me know the exact method you used to obtain the single and pair representations for the respective protein fasta sequence?

zhengsx commented 1 month ago

Please use AlphaFold's representations.