Blog: Preference Tuning LLMs with Direct Preference Optimization Methods

konabuta / my-scratch-book

MIT License

1 stars 0 forks source link

Blog: Preference Tuning LLMs with Direct Preference Optimization Methods #18

Open konabuta opened 4 months ago

konabuta commented 4 months ago

source: [Preference Tuning LLMs with Direct Preference Optimization Methods](https://huggingface.co/blog/pref-tuning)_

konabuta commented 4 months ago

Though RLHF is a major way to align LLMs, there are several algorithmes without reinforcement learning techniques. This blog introduces the following algorithms:

Direct Preference Optimization (DPO)
Indeity Preference Optimization (IPO)
Kahneman-Tversky Optimization (KTO)

konabuta commented 4 months ago

DPO

optimize a simple loss function of alignment function
dataset of preference: ${(x, y_w, y_l)}$
- $x$: prompt
- $y_w$: preferred response
- $y_l$: dispreferred response

Challenge in DPO

Robustness:
- easy to overfit
- Identity Preference Optimization (IPO) - adding regularisation term to the DPO loss
Dispensing with paried preference data altogher
- Kahneman-Tversky Optimization (KTO) - define loss function for individual exampls with "good" or "bad" labels.