hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
https://arxiv.org/abs/2403.13372
Apache License 2.0
32.59k stars 3.99k forks source link

DPO + NLL Loss #5059

Closed maksimstw closed 2 months ago

maksimstw commented 2 months ago

Using NLL to regularize DPO is becoming a common technique to mitigate the overfitting problem of DPO. Is there any plan to introduce this loss to the repo? Thanks! image

Reference Iterative Reasoning Preference Optimization The Llama 3 Herd of Models

hiyouga commented 2 months ago

use pref_ftx: 0.5 option https://github.com/hiyouga/LLaMA-Factory/blob/b7ca6c8dc14f689d0df16684a6121cc0ec24f8ba/src/llamafactory/hparams/finetuning_args.py#L139-L141