Open tambulkar opened 1 month ago
Hi @tambulkar Thanks for the issue! I am not super familiar myself with FSDP. I can give you this pointer: https://github.com/huggingface/peft/blob/main/examples/sft/run_peft_qlora_fsdp.sh which is the official PEFT script that you can use to run QLoRA + SFTTrainer using FSDP
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
The example of multi-gpu training on the SFTrainer docs shows that I should load the GPU into memory, but this doesn't work if the model doesn't fit into memory in the first place. Is there any guidance somewhere on how to use FSDP with SFTtrainer for models that don't fit in one GPU? Can we shard the model before loading it onto the GPU?