Closed CHNRyan closed 3 weeks ago
Meanwhile, when I run it on one GPU, it will OOM. I wonder if because I used zero3_init?
I update accelerate, peft, transformer and bitsandbytes to the newest version, the problem is solved.
I found that after I update those packages, params load to CPU memory during from_pretrained. But when trainer.train begining, all params loading to each GPU without any partition. And then it partition and GPU memory desend. In one word, the problem is still unsolved. @muellerzr
When I fine tuning llama2 with deepspeed and qlora on one node and multi GPUs, I used zero3 to partition the model paramters, but it always first load the whole params on each GPU and partition params just before training instead load params after partition it. After I check the huggingface document, I find it need to put
TrainingArguments
beforefrom_pretrained
. I did it and zero3_init indeed wored, but the confusing problem arised: NotImplementedError: Cannot copy out of meta tensor; no data!Here is my code:
Here is my accelerate config:
And here is error:
After I set
low_cpu_mem_usage=False
infrom_pretrained
, here is another error:I also try to set
empty_init=False
, but the error is LlamaForCausalLM.from_pretrained doesn't has this paramter.I will truly appreciate if anyone can help me solve it ! @kashif @srush @danieldk @akx @kumapo