Closed vwxyzjn closed 5 months ago
SFT repro: https://wandb.ai/costa-huang/huggingface/runs/4fj3uctu/overview?workspace=user-costa-huang. MT Bench: 6.288
DPO repro: https://wandb.ai/costa-huang/huggingface/runs/lddwve1a?workspace=user-costa-huang MT Bench: 7.084
Regression checking: https://wandb.ai/costa-huang/huggingface/reports/regression--Vmlldzo2Njk4NTA2 There is a small lapse in MT Bench scores but the learning curves look very similar, so prob some weird small artifacts.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
SFT repro: https://wandb.ai/costa-huang/huggingface/runs/4fj3uctu/overview?workspace=user-costa-huang. MT Bench: 6.288
DPO repro: https://wandb.ai/costa-huang/huggingface/runs/lddwve1a?workspace=user-costa-huang MT Bench: 7.084
Regression checking: https://wandb.ai/costa-huang/huggingface/reports/regression--Vmlldzo2Njk4NTA2 There is a small lapse in MT Bench scores but the learning curves look very similar, so prob some weird small artifacts.