I am testing out the training/scripts/run_fsdp_qlora.py
My setup includes
x4 NVIDIA RTX 4090 GPUs with 24GB memory
I did change from llama 3 to llama3 instruct. But I dont think it will make a difference. l get the OOM error at the quantisation part before it even starts to train. I kept the quantisation to the same 4bit setup. It seems like AnswerAI is able to do it with 2 24GB GPUs on a 70b model, whereas I have 4 GPUs in this case.
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 112.00 MiB.
Hi @philschmid Thanks for your work here.
I am testing out the training/scripts/run_fsdp_qlora.py
My setup includes x4 NVIDIA RTX 4090 GPUs with 24GB memory
I did change from llama 3 to llama3 instruct. But I dont think it will make a difference. l get the OOM error at the quantisation part before it even starts to train. I kept the quantisation to the same 4bit setup. It seems like AnswerAI is able to do it with 2 24GB GPUs on a 70b model, whereas I have 4 GPUs in this case.
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 112.00 MiB.