Closed DONGRYEOLLEE1 closed 5 months ago
Hmm, hard to say and I can't easily try to reproduce this. Do you already see strange behavior after loading the model, before starting training? If you try without PEFT, do you see the same issue (in case of not having enough memory without PEFT, you could e.g. turn off autograd on most of the layers to "simulate" parameter efficient fine-tuning)? If yes, this could be an accelerate issue.
System Info
peft: 0.10.1.dev0 accelerate: 0.30.0 bitsandbytes: 0.43.1 transformers: 4.39.3 GPU: A6000 * 2 ( 96GB ) nvidia-driver version: 535.171.04 cuda: 11.8
Who can help?
No response
Information
Tasks
examples
folderReproduction
I was training a Llama3-8B-IT model with QLoRA. I successed a training, but GPU wasn't evenly allocate. Is it a version issue with peft or transformers? Or is it a version issue with the graphics driver? I have experience with learning evenly on previous A100*8 servers, but I don't know if this is an issue in this case.
This is my script.
Expected behavior
I want the GPUs to be evenly allocated.