Open konabuta opened 4 months ago
source: [Preference Tuning LLMs with Direct Preference Optimization Methods](https://huggingface.co/blog/pref-tuning)_
Though RLHF is a major way to align LLMs, there are several algorithmes without reinforcement learning techniques. This blog introduces the following algorithms:
DPO
Challenge in DPO
source: [Preference Tuning LLMs with Direct Preference Optimization Methods](https://huggingface.co/blog/pref-tuning)_