Unsupervised ECNet - Githubissues

luoyunan / ECNet

An evolutionary context-integrated deep learning framework for protein engineering

BSD 3-Clause "New" or "Revised" License

63 stars 16 forks source link

Unsupervised ECNet #2

Open eric-jm-lang opened 3 years ago

eric-jm-lang commented 3 years ago

Hello, In the ECNet paper, you built an unsupervised ECNet model that does not require DMS data for training. It uses the predicted probability of an amino acid at a position as a proxy for fitness. Is there a specific code for this unsupervised model? Or is it a question of using the current ECNet code to generate an unsupervised model by using a different input for --train? Could you please provide more details on how to build such an unsupervised model? Many thanks in advance

NOforgetQY commented 2 years ago

TernencezzZ commented 2 years ago

I have the same confusion!

meehljd commented 2 years ago

Same here!

meehljd commented 2 years ago

Maybe one could hack a training file with a bunch of neutral mutations

mutation    score
M1M         1.0
F12F;L30L  1.0
G89G         1.0

meehljd commented 2 years ago

Looks like it worked. Tested with separate test file with random mutations. Need to still validate with experimental data.

Prediction from Training File with neutral mutations:

mutation score prediction D36D;G142G 1.00000000 1.04933691 E145E;S128S 1.00000000 1.04933691 L19L;N152N 1.00000000 1.04933691 E237E;P12P 1.00000000 1.04933691

Prediction from Test File with random mutations:

mutation score prediction A9D;T27L 1.00000000 0.51061106 A124A;I3T 1.00000000 0.98425829 V258L;A211L 1.00000000 -0.28957328 A276R;K252E 1.00000000 1.15801334 E175E;F14A 1.00000000 1.18147123

meehljd commented 2 years ago

Ignore my previous naive attempt. I re-read the paper and recalled @luoyunan used homologous sequences to train a bidirectional model on masked amino acid residues. I reviewed the ECNet and Dataset classes. The provided model can only process mutation-feature paired TSV files for training. Training on homologous sequences must be in a different code base.

eric-jm-lang commented 2 years ago

Thank you for your input @meehljd. Hope @luoyunan can provide more information on how to do this.