Any plan for supporting DPO?

mosaicml / llm-foundry

LLM training code for Databricks foundation models

https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm

Apache License 2.0

3.99k stars 525 forks source link

Any plan for supporting DPO? #846

Open lorabit110 opened 8 months ago

lorabit110 commented 8 months ago

🚀 Feature Request

Support DPO (Direct Preference Optimization) loss and data loader.

Motivation

Many recent open LLMs have achieved promising results from using DPO instead of RL-style tuning like PPO for alignment. And it seems to require less changes to llm-foundry than RLHF.

pretidav commented 4 months ago

same question here