Closed ysj1173886760 closed 2 months ago
btw, i found the same error while trying to run the scripts in examples/sft/run_peft_qlora_deepspeed_stage3.sh.
Do i need to upgrade some package version, or the error is caused by model(since i've changed the model from LLAMA-2-70B to codellama/CodeLlama-7b-Instruct-hf)
Another update, after i inspected the code that triggers this error:
i found transformer lib will not set zero3 when enabled quantization. Let me try to upgrade transformer lib
Nope that won't work, since i found the main branch of transformer also contains this code
it will only initialize deepspeed when quantization not configured
I found that was a wrong check introduced from transformer lib. reinstall these dependencies with newest release solved my problem.
I found that was a wrong check introduced from transformer lib. reinstall these dependencies with newest release solved my problem.
Glad that you found a solution. Could you be so kind to explain further how you solved it, which dependency needed to be updated?
I found that was a wrong check introduced from transformer lib.
Do you mean it's a bug in transformers?
System Info
peft: 0.12.1.dev0 accelerate: 0.33.0.dev0 transformers: 4.45.0.dev0 platform: ubuntu22.04 LTS python 3.10.12 hardward: NVIDIA RTX2080TI * 4
Who can help?
No response
Information
Tasks
examples
folderReproduction
https://github.com/pacman100/LLM-Workshop/tree/main/personal_copilot/training
I've been following this article to finetune a model.
run_peft.sh works on my machine but only use single GPU, so i want to use accelerate + deepspeed to split model into multiple GPUs to train larger model.
DeepSpeed with no quantization also works on my machine. But as long as i enabled quantization, it will raise an error: ValueError: Model was not initialized with
Zero-3
despite being configured for DeepSpeed Zero-3. Please re-initialize your model viaModel.from_pretrained(...)
orModel.from_config(...)
after creating yourTrainingArguments
!So my question is does QLora support deepspeed now, and if so, what is correct way to run it?
Expected behavior
Expect QLora + DeepSpeed will run on multiple GPUs without error.