pengzhangzhi / protein-sequence-diffusion-model

diffusion model for protein sequence generation
MIT License
45 stars 5 forks source link

Collaborate #2

Open Amelie-Schreiber opened 1 year ago

Amelie-Schreiber commented 1 year ago

I'm very interested in replicating your work and would like to train a diffusion model to generate protein binding partners similar to what RFDiffusion accomplishes, but I would like to use ESM-2 models as you have done. If you are open to collaborating, feel free to reach out if you have the time. Also, would you be able to create a tutorial similar to this?

pengzhangzhi commented 1 year ago

hi there! I am open to collaboration on interesting works. You may want to discuss your ideas and implementation details with me?

best, zhangzhi

Amelie-Schreiber commented 1 year ago

Hi, I am relatively new to training diffusion models. I have only fine-tuned ESM-2 models for sequence classification and for token classification. Are you using EsmForProteinFolding as the backbone in your diffusion model? If so, I don't believe I have access to a good enough GPU to train it. My GPUs are too small unless a smaller model can be used. I hope that I am wrong, or that another ESM-2 model can be used that is smaller. Otherwise I am stuck and unable to train. I am having trouble understanding your code also and was hoping we might work on writing a notebook similar to this: https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb

Thanks for responding! Amelie

On Thu, Aug 31, 2023 at 9:43 PM Zhangzhi Peng @.***> wrote:

hi there! I am open to collaboration on interesting works. You may want to discuss your ideas and implementation details with me?

best, zhangzhi

— Reply to this email directly, view it on GitHub https://github.com/pengzhangzhi/protein-sequence-diffusion-model/issues/2#issuecomment-1702149381, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMIK6IGP3CHAHK3NDFWIGATXYFRYBANCNFSM6AAAAAA4G2PBNE . You are receiving this because you authored the thread.Message ID: @.*** com>

pengzhangzhi commented 1 year ago

hi, the training is pretty cheap. I can fit the model in a 10g GPU. Regarding the documentation, please follow the readme to install pkgs and train the model. Please let me know which parts confuse you.

best, Zhangzhi

Amelie-Schreiber commented 1 year ago

Could you find me on discord? Also, could I use Hugging Face's accelerator to do data parallelization to split training across two 8GB GPUs? If so, that might work...

EDIT: I've tried training on a P100 GPU (using a colab instance) and it doesn't seems to work. My training script must not be setup correctly or something.

pengzhangzhi commented 1 year ago

Hi,

Amelie-Schreiber commented 1 year ago

Hi! I tried following the install instruction and I am having some issues. First, there seems to be a mistake in the install instructions. I believe you need

cd protein-sequence-diffusion-model

instead of

cd denoising_diffusion_protein_sequence

Also. Once everything is installed, I am getting the following error:

(esm2d) C:\Users\OWO\Desktop\amelie_vscode\esmd\protein-sequence-diffusion-model\denoising_diffusion_pytorch>python pl_train.py --max_epochs 1 --fas_dpath seq_data/fas
C:\Users\OWO\anaconda3\envs\esm2d\lib\site-packages\Bio\pairwise2.py:278: BiopythonDeprecationWarning: Bio.pairwise2 has been deprecated, and we intend to remove it in a future release of Biopython. As an alternative, please consider using Bio.Align.PairwiseAligner as a replacement, and contact the Biopython developers if you still need the Bio.pairwise2 module.
  warnings.warn(
C:\Users\OWO\anaconda3\envs\esm2d\lib\site-packages\torchaudio\backend\utils.py:74: UserWarning: No audio backend is available.
  warnings.warn("No audio backend is available.")
seq_data/fas\seqs.a3m already exists.
Traceback (most recent call last):
  File "C:\Users\OWO\Desktop\amelie_vscode\esmd\protein-sequence-diffusion-model\denoising_diffusion_pytorch\pl_train.py", line 205, in <module>
    train(args)
  File "C:\Users\OWO\Desktop\amelie_vscode\esmd\protein-sequence-diffusion-model\denoising_diffusion_pytorch\pl_train.py", line 187, in train
    trainer = pl.Trainer(
  File "C:\Users\OWO\anaconda3\envs\esm2d\lib\site-packages\pytorch_lightning\utilities\argparse.py", line 70, in insert_env_defaults
    return fn(self, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'gpus'
pengzhangzhi commented 1 year ago

I guess the error is because the pytorch lightning version is updated and they stop using gpus as an argument. please set accelerator="auto" https://lightning.ai/docs/pytorch/stable/common/trainer.html

use trainer = pl.Trainer(max_epochs=20,accelerator="auto") Ref: https://stackoverflow.com/a/76193000