SFTrainer with FSDP on a model that doens't fit in GPU memory

huggingface / trl

Train transformer language models with reinforcement learning.

http://hf.co/docs/trl

Apache License 2.0

8.71k stars 1.07k forks source link

SFTrainer with FSDP on a model that doens't fit in GPU memory #1681

Open tambulkar opened 1 month ago

tambulkar commented 1 month ago

The example of multi-gpu training on the SFTrainer docs shows that I should load the GPU into memory, but this doesn't work if the model doesn't fit into memory in the first place. Is there any guidance somewhere on how to use FSDP with SFTtrainer for models that don't fit in one GPU? Can we shard the model before loading it onto the GPU?

younesbelkada commented 3 weeks ago

Hi @tambulkar Thanks for the issue! I am not super familiar myself with FSDP. I can give you this pointer: https://github.com/huggingface/peft/blob/main/examples/sft/run_peft_qlora_fsdp.sh which is the official PEFT script that you can use to run QLoRA + SFTTrainer using FSDP

github-actions[bot] commented 7 hours ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.