PPO训练时不能使用QLoRA吗？ - Githubissues

hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

https://arxiv.org/abs/2403.13372

Apache License 2.0

33.96k stars 4.18k forks source link

PPO训练时不能使用QLoRA吗？ #1185

Closed hzho2000 closed 1 year ago

hzho2000 commented 1 year ago

PPO训练时不能使用QLoRA吗？

hiyouga commented 1 year ago

可以使用。

hzho2000 commented 1 year ago

大佬您好，这是我在弄ppo训练时的参数。不进行QLoRA可以正常使用，但是在我加了--quantization_bit 8这一行后就会报错： ValueError: Quantized model cannot create new LoRA weight. Merge them first.

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \ --stage ppo \ --model_name_or_path C:\model\chatglm2-6b-32k \ --do_train \ --dataset trainset \ --template chatglm2 \ --finetuning_type lora \ --lora_target query_key_value \ --resume_lora_training False \ --checkpoint_dir result/medical-lora-v1 \ --reward_model result/medical-rm-v1 \ --output_dir result/medical-ppo-v1 \ --per_device_train_batch_size 2 \ --gradient_accumulation_steps 2 \ --cutoff_len 2048 \ --max_new_tokens 2048 \ --lr_scheduler_type cosine \ --logging_steps 10 \ --save_steps 1000 \ --learning_rate 1e-5 \ --num_train_epochs 1.0 \ --fp16 \ --plot_loss

hiyouga commented 1 year ago

先使用 export_model.py 把 checkpoint_dir 合并进去，然后使用新的 --model_name_or_path

hzho2000 commented 1 year ago

你好。出现这个报错是不是意味着rm阶段和ppo阶段都得使用相同的量化等级，例如不能在训练rm时没有使用QLoRA，然后训练PPO时使用。

然后我再想扩展咨询一下就是，使用QLoRA进行sft生成的weight，能和原本的没有量化的模型合并吗？

RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

hiyouga commented 1 year ago

PPO 仅支持 6B 模型，不支持 32k 模型
可以

Wolverhampton0 commented 1 year ago

PPO 仅支持 6B 模型，不支持 32k 模型

可以

请问为什么之前版本做ppo训练＋qlora时不用合并，为什么多了这个合并操作，是之前的方式有什么问题吗？

Wolverhampton0 commented 1 year ago

而且这个合并的处理，和在训练或推理时直接加载lora的权重会有区别吗?

lindsey-chang commented 1 year ago

而且这个合并的处理，和在训练或推理时直接加载lora的权重会有区别吗?

请问这个问题现在有解答吗？

hiyouga commented 1 year ago

没有区别