luo-group / MODIFY

ML-optimized library design with improved fitness and diversity for protein engineering
MIT License
18 stars 2 forks source link

esm script issues #2

Open willfinnigan opened 1 week ago

willfinnigan commented 1 week ago

Hi, great work!

I ran into a couple of problems running your esm script.

First I was getter a circular import error. Renaming esm.py to esm_modify.py fixed this.

Secondly I couldn't see a way to pass in positions (other than the hard coded GB1 positions). I added a positions argument.

And finally I was running out of GPU memory on the larger ESM model, setting batch_size to 1 fixed this. I added an argument for this also.

Then it all ran great 😊

I put my changes at https://github.com/willfinnigan/MODIFY - can make a PR if you like.

Kind regards, Will

willfinnigan commented 1 week ago

Couple of other notes.

The EvCoupling server doesn't seem to be providing .model files currently - obviously nothing you guys can do about that.

The MSATransformer example had the same issue with the positions argument. But in addition, it's not really clear how the _filtered_0.a2m to _filtered_4.a2m files are made. I imagine the answer lies somewhere in the EVE package - but it's not obvious and I've run out of time for this. Could be something to add to your zero.ipyn notebook example?

kerrding commented 1 week ago

Hi Will,

Thank you for your comments!

We would first like to thank you for pointing out the current limitations of the files esm.py and msa.py for making ESM-1v/ESM-2/MSA Transformer predictions. We are now revising the script to enable its broad usability, hoping that users can make personalized predictions without needing to go down and change the code inside the script. This code revision also involves using another argument ('--offset'), as sometimes the input sequence doesn't always start numbering at 0. We greatly appreciate your insightful suggestions for improvement and will take them into account for our revision.

For EVcouplings, the EVcouplings server provides the download of .model files. The model file can be found under the Downloads tab, named [EVcouplings model parameters]. Furthermore, if you would like to rerun EVmutation on a personalized setting, you can download the MSA from the EVcouplings server and run EVcouplings (https://github.com/debbiemarkslab/EVcouplings) locally.

For the MSA processing of MSA Transformer, we followed the paper of MSA Transformer (https://proceedings.mlr.press/v139/rao21a.html), which involves calculating the weights using EVE and subsampling of the MSA. We will provide more details in the zero.ipynb file following your suggestions.

We truly appreciate your comments on our code and the support for our project!

Best Regards, Kerr

willfinnigan commented 6 days ago

Hi kerrding, thanks for your response. More details in zero.ipynb sounds great!

For Evcouplings - the only download option I can find is from this download button, which gives a zip file with all sorts of files, but no model. I am missing some other option? Or is this a difference between the v1 server and v2?

Screenshot 2024-09-12 at 11 00 07 AM

Kind regards, Will

willfinnigan commented 6 days ago

One more thing - running the modify script has a couple of undocumented things which need doing. Needs a {protein_name}_zero.csv in the protein folder. And needs a wt.fasta too.