Open Amelie-Schreiber opened 1 year ago
hi there! I am open to collaboration on interesting works. You may want to discuss your ideas and implementation details with me?
best, zhangzhi
Hi, I am relatively new to training diffusion models. I have only
fine-tuned ESM-2 models for sequence classification and for token
classification. Are you using EsmForProteinFolding
as the backbone in
your diffusion model? If so, I don't believe I have access to a good enough
GPU to train it. My GPUs are too small unless a smaller model can be used.
I hope that I am wrong, or that another ESM-2 model can be used that is
smaller. Otherwise I am stuck and unable to train. I am having trouble
understanding your code also and was hoping we might work on writing a
notebook similar to this:
https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb
Thanks for responding! Amelie
On Thu, Aug 31, 2023 at 9:43 PM Zhangzhi Peng @.***> wrote:
hi there! I am open to collaboration on interesting works. You may want to discuss your ideas and implementation details with me?
best, zhangzhi
— Reply to this email directly, view it on GitHub https://github.com/pengzhangzhi/protein-sequence-diffusion-model/issues/2#issuecomment-1702149381, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMIK6IGP3CHAHK3NDFWIGATXYFRYBANCNFSM6AAAAAA4G2PBNE . You are receiving this because you authored the thread.Message ID: @.*** com>
hi, the training is pretty cheap. I can fit the model in a 10g GPU. Regarding the documentation, please follow the readme to install pkgs and train the model. Please let me know which parts confuse you.
best, Zhangzhi
Could you find me on discord? Also, could I use Hugging Face's accelerator to do data parallelization to split training across two 8GB GPUs? If so, that might work...
EDIT: I've tried training on a P100 GPU (using a colab instance) and it doesn't seems to work. My training script must not be setup correctly or something.
Hi,
Hi! I tried following the install instruction and I am having some issues. First, there seems to be a mistake in the install instructions. I believe you need
cd protein-sequence-diffusion-model
instead of
cd denoising_diffusion_protein_sequence
Also. Once everything is installed, I am getting the following error:
(esm2d) C:\Users\OWO\Desktop\amelie_vscode\esmd\protein-sequence-diffusion-model\denoising_diffusion_pytorch>python pl_train.py --max_epochs 1 --fas_dpath seq_data/fas
C:\Users\OWO\anaconda3\envs\esm2d\lib\site-packages\Bio\pairwise2.py:278: BiopythonDeprecationWarning: Bio.pairwise2 has been deprecated, and we intend to remove it in a future release of Biopython. As an alternative, please consider using Bio.Align.PairwiseAligner as a replacement, and contact the Biopython developers if you still need the Bio.pairwise2 module.
warnings.warn(
C:\Users\OWO\anaconda3\envs\esm2d\lib\site-packages\torchaudio\backend\utils.py:74: UserWarning: No audio backend is available.
warnings.warn("No audio backend is available.")
seq_data/fas\seqs.a3m already exists.
Traceback (most recent call last):
File "C:\Users\OWO\Desktop\amelie_vscode\esmd\protein-sequence-diffusion-model\denoising_diffusion_pytorch\pl_train.py", line 205, in <module>
train(args)
File "C:\Users\OWO\Desktop\amelie_vscode\esmd\protein-sequence-diffusion-model\denoising_diffusion_pytorch\pl_train.py", line 187, in train
trainer = pl.Trainer(
File "C:\Users\OWO\anaconda3\envs\esm2d\lib\site-packages\pytorch_lightning\utilities\argparse.py", line 70, in insert_env_defaults
return fn(self, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'gpus'
I guess the error is because the pytorch lightning version is updated and they stop using gpus as an argument.
please set accelerator="auto"
https://lightning.ai/docs/pytorch/stable/common/trainer.html
use trainer = pl.Trainer(max_epochs=20,accelerator="auto") Ref: https://stackoverflow.com/a/76193000
I'm very interested in replicating your work and would like to train a diffusion model to generate protein binding partners similar to what RFDiffusion accomplishes, but I would like to use ESM-2 models as you have done. If you are open to collaborating, feel free to reach out if you have the time. Also, would you be able to create a tutorial similar to this?