shibing624 / MedicalGPT

MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型,实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO。
Apache License 2.0
2.94k stars 451 forks source link

PPO和SFT阶段数据集 #381

Closed pangpang-xuan closed 1 week ago

pangpang-xuan commented 2 weeks ago

您好 我想问一下PPO和SFT阶段使用的数据集一定要一样吗?

shibing624 commented 2 weeks ago

最好不一样。ppo要求的数据集质量更高。

pangpang-xuan commented 1 week ago

好的 谢谢您