huggingface / alignment-handbook

Robust recipes to align language models with human and AI preferences
https://huggingface.co/HuggingFaceH4
Apache License 2.0
4.6k stars 401 forks source link

Differences between alpha and beta models #12

Closed liutianlin0121 closed 11 months ago

liutianlin0121 commented 11 months ago

Hi!

I am wondering if there is an official documentation of differences between zephyr-7b-alpha and zephyr-7b-beta.

According to this blog

Zephyr beta trains for more DPO epochs (than Zephyr alpha) leading to better chat results!

According to this linkedin post

Compared to Zephyr-Alpha, they filtered the data to get rid of issues related to incorrect casing and weird starts for some answers.

The above sources are from 3rd parties. I'm curious if there's an official reference for these differences. The Zephyr paper doesn't appear to cover them. Thanks a lot!

Tianlin

lewtun commented 11 months ago

Hi @liutianlin0121 thanks for your question! We're still working on the guides for the handbook where we'll explain these distinctions in detail, but the tl;dr is that they differ in two main ways:

Hope that helps!

liutianlin0121 commented 11 months ago

I see! Thanks so much for your help!