huggingface / peft

đŸ¤— PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
https://huggingface.co/docs/peft
Apache License 2.0
16.37k stars 1.61k forks source link

QLora with DeepSpeed support #2014

Closed ysj1173886760 closed 2 months ago

ysj1173886760 commented 2 months ago

System Info

peft: 0.12.1.dev0 accelerate: 0.33.0.dev0 transformers: 4.45.0.dev0 platform: ubuntu22.04 LTS python 3.10.12 hardward: NVIDIA RTX2080TI * 4

Who can help?

No response

Information

Tasks

Reproduction

https://github.com/pacman100/LLM-Workshop/tree/main/personal_copilot/training

I've been following this article to finetune a model.

run_peft.sh works on my machine but only use single GPU, so i want to use accelerate + deepspeed to split model into multiple GPUs to train larger model.

DeepSpeed with no quantization also works on my machine. But as long as i enabled quantization, it will raise an error: ValueError: Model was not initialized with Zero-3 despite being configured for DeepSpeed Zero-3. Please re-initialize your model via Model.from_pretrained(...) or Model.from_config(...) after creating your TrainingArguments!

So my question is does QLora support deepspeed now, and if so, what is correct way to run it?

Expected behavior

Expect QLora + DeepSpeed will run on multiple GPUs without error.

ysj1173886760 commented 2 months ago

btw, i found the same error while trying to run the scripts in examples/sft/run_peft_qlora_deepspeed_stage3.sh.

Do i need to upgrade some package version, or the error is caused by model(since i've changed the model from LLAMA-2-70B to codellama/CodeLlama-7b-Instruct-hf)

ysj1173886760 commented 2 months ago

Another update, after i inspected the code that triggers this error:

image

i found transformer lib will not set zero3 when enabled quantization. Let me try to upgrade transformer lib

ysj1173886760 commented 2 months ago

Nope that won't work, since i found the main branch of transformer also contains this code

image

it will only initialize deepspeed when quantization not configured

ysj1173886760 commented 2 months ago

I found that was a wrong check introduced from transformer lib. reinstall these dependencies with newest release solved my problem.

BenjaminBossan commented 2 months ago

I found that was a wrong check introduced from transformer lib. reinstall these dependencies with newest release solved my problem.

Glad that you found a solution. Could you be so kind to explain further how you solved it, which dependency needed to be updated?

I found that was a wrong check introduced from transformer lib.

Do you mean it's a bug in transformers?