ppo训练时出现问题：UserWarning: KL divergence is starting to become negative: -233.50

shibing624 / MedicalGPT

MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型，实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO。

Apache License 2.0

2.94k stars 451 forks source link

ppo训练时出现问题：UserWarning: KL divergence is starting to become negative: -233.50 #374

Open user2311717757 opened 1 month ago

user2311717757 commented 1 month ago

ppo 训练过程中出现UserWarning: KL divergence is starting to become negative: -233.50 - this might be a precursor for failed training. sometimes this happens because the generation kwargs are not correctly set. Please make sure that the generation kwargs are set correctly, or review your training hyperparameters.如何解决啊

shibing624 commented 1 month ago

UserWarning 忽略

user2311717757 commented 1 month ago

FFA71931-01A2-445E-85AF-8E6726D5AF6E 我打印了actor模型的回复，如果忽略警告会导致actor模型无法生成内容