Open Di-Zayn opened 8 months ago
Hi @Di-Zayn, note that you will need to also modify the configuration used for DeepSpeed ZeRO 3, as the one they share is the one is suited for a VM with 8 x A100 80GB, so to suit your needs you may need to add the flags required to load and train using a lower precision.
Anyway not sure about how to fine-tune that using NF4, but maybe https://www.deepspeed.ai/tutorials/MoQ-tutorial/#deepspeed-configuration-file is worth checking?
I'm getting this issue as well (trying qlora with ZeRO-3 and 4 gpus, same error message), @Di-Zayn were you able to solve it?
I had similar problems and I decided to use the multi_gpu script and set the param to just use 2 GPUs and everything was working fine: https://github.com/huggingface/alignment-handbook/blob/main/recipes/accelerate_configs/multi_gpu.yaml
However, on the Zero code, the starting loss was like 1.7 instead of 1.4 with the multi-gpu script both when using 1 or 2 GPUs
I never bothered further experimenting with Zero as I got the results I needed with multi_gpu script
I was keen on sharding the model across gpus in order to be able to allow for larger models.
As an aside, the latest FSDP and qlora examples are working for me - that works for my use case https://github.com/huggingface/alignment-handbook/commit/606d2e954fd17999af40e6fb4f712055ca11b2f0
I added a 4-bit load after the command LoRA training with ZeRO-3 on two or more GPUs to achieve a mix of QLoRA and ZeRO-3. But the program encountered the following error: RuntimeError: expected there to be only one unique element in <generator object Init._convert_to_deepspeed_param..all_gather_coalesced.. at 0x7f2ec8daf900>
The command is:
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml --num_processes=2 scripts/run_sft.py recipes/zephyr-7b-beta/sft/config_lora.yaml --load_in_4bit=true