konabuta / my-scratch-book

MIT License
1 stars 0 forks source link

Blog: Preference Tuning LLMs with Direct Preference Optimization Methods #18

Open konabuta opened 4 months ago

konabuta commented 4 months ago

source: [Preference Tuning LLMs with Direct Preference Optimization Methods](https://huggingface.co/blog/pref-tuning)_

konabuta commented 4 months ago

Though RLHF is a major way to align LLMs, there are several algorithmes without reinforcement learning techniques. This blog introduces the following algorithms:

konabuta commented 4 months ago

DPO

Challenge in DPO