mheinzinger / ProstT5

Bilingual Language Model for Protein Sequence and Structure
MIT License
147 stars 13 forks source link

Back Translate 3Di Tokesn to PDB Format #7

Closed mahdip72 closed 5 months ago

mahdip72 commented 5 months ago

Hello, I have a few questions which might be very basic to others.

  1. Can we convert the 3Di tokens produced by foldseek into the actual 3D structure format like PDB files?

  2. How can I compare two sequences of predicted and true 3Di tokens? Can I use something like the TM Score for that? I am working on a model to predict 3Di tokens from amino acids and am searching for the best metric to evaluate the model.

I would be very greatful to whom can answer my questions.

mheinzinger commented 5 months ago

Hi,

1.) no, from what I know this is not possible, yet. 2.) TM-score requires 3D structures for 3D-alignment and distance computation. So if you only have predicted 3Di, this won't be possible. For comparing true/predicted 3Di, I usually used simple classification accuracy. If you want to make it a bit more complicated, you can also compute alignment of true/predicted 3Di using the 3Di-substitution matrix provided by Foldseek.