huggingface / alignment-handbook

Robust recipes to align language models with human and AI preferences
https://huggingface.co/HuggingFaceH4
Apache License 2.0
4.2k stars 357 forks source link

Constitutional AI recipe #108

Closed vwxyzjn closed 5 months ago

vwxyzjn commented 5 months ago

SFT repro: https://wandb.ai/costa-huang/huggingface/runs/4fj3uctu/overview?workspace=user-costa-huang. MT Bench: 6.288

DPO repro: https://wandb.ai/costa-huang/huggingface/runs/lddwve1a?workspace=user-costa-huang MT Bench: 7.084

Regression checking: https://wandb.ai/costa-huang/huggingface/reports/regression--Vmlldzo2Njk4NTA2 There is a small lapse in MT Bench scores but the learning curves look very similar, so prob some weird small artifacts.

HuggingFaceDocBuilderDev commented 5 months ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.