Closed maksimstw closed 2 months ago
Using NLL to regularize DPO is becoming a common technique to mitigate the overfitting problem of DPO. Is there any plan to introduce this loss to the repo? Thanks!
Reference Iterative Reasoning Preference Optimization The Llama 3 Herd of Models
use pref_ftx: 0.5 option https://github.com/hiyouga/LLaMA-Factory/blob/b7ca6c8dc14f689d0df16684a6121cc0ec24f8ba/src/llamafactory/hparams/finetuning_args.py#L139-L141
pref_ftx: 0.5
Using NLL to regularize DPO is becoming a common technique to mitigate the overfitting problem of DPO. Is there any plan to introduce this loss to the repo? Thanks!
Reference Iterative Reasoning Preference Optimization The Llama 3 Herd of Models