HF Trainer compatibility

hugohrban / ProGen2-finetuning

Finetuning ProGen2 protein language model for generation of protein sequences from selected protein families.

BSD 3-Clause "New" or "Revised" License

28 stars 6 forks source link

HF Trainer compatibility #5

Open martinez-zacharya opened 3 months ago

martinez-zacharya commented 3 months ago

Quick question regarding the compatibility of ProGen2 with the HuggingFace Trainer - I would like to implement distributed strategies, deepspeed etc, and the Trainer class with accelerate is able to handle much of that stuff under the hood, so I was wondering if this version of ProGen2 could be used in the same way as other AutoModelForCausalLM's?

hugohrban commented 2 months ago

I have never experimented with this, but I assume it should be possible. The progen2 architecture is the same as GPT-J, which was quite big on HF, so I would think it would be supported. Originally I removed the functions for parallelizing model layers across multiple GPUs to decrease complexity, but now I added them again (commit 1dd8ed4181d8), so you can go ahead and try running it.

martinez-zacharya commented 2 months ago

Ok so I've been playing around with your models and I think I've figured out how to integrate it nicely with HF Trainer. I've attached a jupyter notebook with a minimal example of what I mean. With this, I've been able to use Deepspeed offloading, fp16, etc. by just passing the args to TrainingArguments!

minimal_example.zip