timkucera / proteogan

Protein generative model conditioned on Gene Ontology terms
MIT License
34 stars 3 forks source link

ESM log_likelihood protein stability equivalent in ProteoGAN? #1

Open avilella opened 2 years ago

avilella commented 2 years ago

Hi, is there a way to calculate ESM's log_likelihood protein stability equivalent in ProteoGAN? Thx in advance.

timkucera commented 2 years ago

Hi Albert. Likelihoods are not easily accessible in GANs, and to my knowledge there is no equivalent that has been linked to stability yet. I could imagine that the discriminator output correlates with stability. But in principle you can of course use ESM's likelihoods on the generated sequences from ProteoGAN.

avilella commented 2 years ago

I've searched for "discriminator" in the code, and I wonder if there could be a script in this repo that acts similarly to the ESM's log_likelihood script, e.g.:

python3 proteogan/calc_discriminator.py <protein_sequence>

or similar. Would that be possible?

timkucera commented 2 years ago

I've added a script here. Note that this uses the pretrained model weights from ProteoGAN, and as such the scores very much depend on the GO term labels. I highly recommend to retrain the model if you have different or no labels.

I should stress again that a correlation between stability and discriminator output has in no way been tested or verified. Nevertheless I'm curious, let me know if you find something interesting.