DPO 训练时，prompt 与 answer 拼接问题，导致cutoff_length这一超参数无法对数据进行有效截断。

THZdyjy commented 4 days ago

如上图所示，在源码中，在拼接 prompt 和 rejected 时，这里的 prompt 采用的是 chosen_prompt, 而不是 rejected_prompt ,这导致当设置了 cutoff_length=2048时，不能对 rejected 数据进行有效截断。将代码修改后，如下图所示，能够根据cutoff_length对数据进行有效截断。

niravlg commented 3 days ago

I believe this issue has been mentioned in https://github.com/hiyouga/LLaMA-Factory/issues/4402

As far as I understand, the above suggested solution changes the prompt used for chosen and rejected responses in DPO which likely effect the training and results. Instead, I believe the implementation should follow from the DPO Trainer's implementation in - https://github.com/huggingface/trl/blob/main/trl/trainer/dpo_trainer.py

PS - The above issue in not mentioned in my native language. I used ChatGPT to translate it to english. I apologize in advance of any confusion.

hiyouga commented 2 days ago

fixed

hiyouga / LLaMA-Factory

DPO 训练时，prompt 与 answer 拼接问题，导致cutoff_length这一超参数无法对数据进行有效截断。 #4617