Closed THZdyjy closed 2 days ago
如上图所示,在源码中,在拼接 prompt 和 rejected 时,这里的 prompt 采用的是 chosen_prompt, 而不是 rejected_prompt
,这导致当设置了 cutoff_length=2048时,不能对 rejected 数据进行有效截断。
将代码修改后,如下图所示,能够根据cutoff_length对数据进行有效截断。
I believe this issue has been mentioned in https://github.com/hiyouga/LLaMA-Factory/issues/4402
As far as I understand, the above suggested solution changes the prompt used for chosen and rejected responses in DPO which likely effect the training and results. Instead, I believe the implementation should follow from the DPO Trainer's implementation in - https://github.com/huggingface/trl/blob/main/trl/trainer/dpo_trainer.py
PS - The above issue in not mentioned in my native language. I used ChatGPT to translate it to english. I apologize in advance of any confusion.
fixed