hiyouga / LLaMA-Factory

Unify Efficient Fine-Tuning of 100+ LLMs
Apache License 2.0
25.52k stars 3.16k forks source link

DPO 训练时,prompt 与 answer 拼接问题,导致cutoff_length这一超参数无法对数据进行有效截断。 #4617

Closed THZdyjy closed 2 days ago

THZdyjy commented 4 days ago

image 如上图所示,在源码中,在拼接 prompt 和 rejected 时,这里的 prompt 采用的是 chosen_prompt, 而不是 rejected_prompt ,这导致当设置了 cutoff_length=2048时,不能对 rejected 数据进行有效截断。 将代码修改后,如下图所示,能够根据cutoff_length对数据进行有效截断。 image

niravlg commented 3 days ago

I believe this issue has been mentioned in https://github.com/hiyouga/LLaMA-Factory/issues/4402

As far as I understand, the above suggested solution changes the prompt used for chosen and rejected responses in DPO which likely effect the training and results. Instead, I believe the implementation should follow from the DPO Trainer's implementation in - https://github.com/huggingface/trl/blob/main/trl/trainer/dpo_trainer.py

PS - The above issue in not mentioned in my native language. I used ChatGPT to translate it to english. I apologize in advance of any confusion.

hiyouga commented 2 days ago

fixed