DPO + NLL Loss - Githubissues

hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

https://arxiv.org/abs/2403.13372

Apache License 2.0

32.59k stars 3.99k forks source link

DPO + NLL Loss #5059

Closed maksimstw closed 2 months ago

maksimstw commented 2 months ago

Using NLL to regularize DPO is becoming a common technique to mitigate the overfitting problem of DPO. Is there any plan to introduce this loss to the repo? Thanks!

Reference Iterative Reasoning Preference Optimization The Llama 3 Herd of Models

hiyouga commented 2 months ago

use pref_ftx: 0.5 option https://github.com/hiyouga/LLaMA-Factory/blob/b7ca6c8dc14f689d0df16684a6121cc0ec24f8ba/src/llamafactory/hparams/finetuning_args.py#L139-L141