idmjky / EvolvePro

PLM based active learning model for protein engineering
Other
40 stars 3 forks source link

What is the necessary demand of the model for computational resources? #1

Open ysy-scu opened 1 month ago

ysy-scu commented 1 month ago

Hi Author: Your work is really great, I would like to try to use your model to modify my own enzyme, but the computational resources are a troubling point for me, what kind of computational resources are necessary? Best wishes. YSY

idmjky commented 1 month ago

Hi, The most computationally intensive part for using this model is the generation of ESM2 embeddings which takes a high- performance GPU like A100 to load the 15B parameter and run inference. In our hand, 1 A100 GPU is sufficient to generate all single mutant embedding for a protein in 4-8 hours. Hope that helps. Best, Kaiyi

ysy-scu commented 1 month ago

Thank you very much for your reply, it has helped me a lot. I have successfully run the model to complete the Run concatenate.sh task so far. However, I have encountered some confusion while running toplayer.sh, such as do I need experimental data to provide when I first proceed, using hc.fasta as an example. Is it necessary for me to wet experiment all 2217 mutations in it to verify out the data as input for the first time. Maybe my understanding of the model is not accurate enough, and I am very much looking forward to your help in answering my troubles. Best YSY

idmjky commented 1 month ago

So this very much depends on what sort of experimental setup you would like to use. I have updated the repo with an example excel file containing 15 mutants and their experimentally measured fitness. Typically when starting with the evolution, you can randomly pick 10-16 mutants from the whole distribution of single mutants, then experimentally measure their activity. This wet lab data will be used as the input training data to the random_forest regressor. Hope that helps.