hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
https://arxiv.org/abs/2403.13372
Apache License 2.0
32.24k stars 3.95k forks source link

llama3.1 70B OOM for qlora+fsdp sft. #5169

Closed mces89 closed 1 month ago

mces89 commented 1 month ago

Reminder

System Info

I'm using the latest llamafactory version.

Reproduction

Hi, I'm trying to use qlora+fsdp for the llama3.1 70B model with 8xA100(640G). In my case, the context length is large(32k), I'm using the fsdp and qlora setting(quantization_bit=4) with fa2, but still get OOM after some iterations. Is there any other way to save the memory further?

Expected behavior

No response

Others

No response

mces89 commented 1 month ago

Also tried lora with deepspeed zero3 on 2 nodes(16xA100 80G), also got OOM.

hiyouga commented 1 month ago

Try reduce context length