OOM error in FSDP QLORA setup

Hi @philschmid Thanks for your work here.

I am testing out the training/scripts/run_fsdp_qlora.py

My setup includes x4 NVIDIA RTX 4090 GPUs with 24GB memory

I did change from llama 3 to llama3 instruct. But I dont think it will make a difference. l get the OOM error at the quantisation part before it even starts to train. I kept the quantisation to the same 4bit setup. It seems like AnswerAI is able to do it with 2 24GB GPUs on a 70b model, whereas I have 4 GPUs in this case.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 112.00 MiB.

philschmid / deep-learning-pytorch-huggingface

OOM error in FSDP QLORA setup #60