Closed fancyerii closed 6 months ago
I'm not an expert on DeepSpeed, so I'm not sure why this is happening.
@pacman100 did however recently add a comprehensive guide to our docs. Maybe you can find something there that could help you?
You cannot use DeepSpeed with bitsandbytes Quantization as they both aren't compatible.
You should either use LoRA+DeepSpeed or only QLoRA.
System Info
transformers 4.37.2 accelerate 0.26.1 peft 0.8.2 bitsandbytes 0.43.0.dev0 # latest built from source trl 0.7.11.dev0 # latest built from source torch 2.2.0 python 3.9.18
Who can help?
No response
Information
Tasks
examples
folderReproduction
error message:
I have 8 a100 40GB gpus. I think for llama2-7b, it's enough. And I checked the yaml config, it has "zero3_init_flag: true". So I think it will not load the whole model to a single gpu/device but load each shard its own parameters.
But in peft/utils/other.py
it seems peft need cast bf16 parameter to fp32. When it ran, I saw there are 8 processes ran in gpu0 and the gpu0 memory is used up and it failed. So I guess the peft don't support shard parameters to 8 gpus but load all to a single gpu.
Expected behavior
run correctly.