Differences between alpha and beta models

huggingface / alignment-handbook

Robust recipes to align language models with human and AI preferences

https://huggingface.co/HuggingFaceH4

Apache License 2.0

4.6k stars 401 forks source link

Differences between alpha and beta models #12

Closed liutianlin0121 closed 11 months ago

liutianlin0121 commented 11 months ago

Hi!

I am wondering if there is an official documentation of differences between zephyr-7b-alpha and zephyr-7b-beta.

According to this blog

Zephyr beta trains for more DPO epochs (than Zephyr alpha) leading to better chat results!

According to this linkedin post

Compared to Zephyr-Alpha, they filtered the data to get rid of issues related to incorrect casing and weird starts for some answers.

The above sources are from 3rd parties. I'm curious if there's an official reference for these differences. The Zephyr paper doesn't appear to cover them. Thanks a lot!

Tianlin

lewtun commented 11 months ago

Hi @liutianlin0121 thanks for your question! We're still working on the guides for the handbook where we'll explain these distinctions in detail, but the tl;dr is that they differ in two main ways:

The SFT policy of zephyr-7b-alpha was trained on a larger, less filtered version of UltraChat (I think we can open source it soon). We later found that this subset had a lot of grammatical errors which were fixed for zephy-7b-beta
zephyr-7b-alpha was trained for 1 DPO epoch, while zephyr-7b-beta was trained for 3.

Hope that helps!

liutianlin0121 commented 11 months ago

I see! Thanks so much for your help!