RL强化学习训练后，模型合并时报错，前面的步骤完全按pipeline命令，

shibing624 / MedicalGPT

MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型，实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO。

Apache License 2.0

3.24k stars 492 forks source link

Closed PICOPON closed 10 months ago

PICOPON commented 11 months ago

python merge_peft_adapter.py --model_type bloom \ --base_model_name_or_path merged-sft --peft_model_path outputs-rl-v1 --output_dir merged-rl/

shibing624 commented 11 months ago

训练中出错？可能是显存不足，分到2个卡了，收到改掉device="auto" -》 device="cuda:0"强制分到一个gpu卡跑