huggingface / alignment-handbook

Robust recipes to align language models with human and AI preferences
https://huggingface.co/HuggingFaceH4
Apache License 2.0
4.45k stars 385 forks source link

SFT checkpoint of zephyr-7b #5

Closed liutianlin0121 closed 10 months ago

liutianlin0121 commented 10 months ago

Hi!

Thanks for releasing the Zephyr-7b-alpha model! I am wondering if it is possible to access the SFT checkpoint of Zephyr-7b (the model prior to DPO training).

As I understand, the SFT training of Zephyr-7b was done based on this script. This is very useful! Since the SFT training script is open-source, could you also release a SFT model checkpoint? Having access to the SFT checkpoint would help us investigate the impact of DPO without requiring significant GPU resources.

Thanks for your consideration! Tianlin

lewtun commented 10 months ago

Hi Tianlin! Yes we plan to release the SFT checkpoint quite soon and I'll report back here when it's available 🤗

liutianlin0121 commented 10 months ago

Hi Tianlin! Yes we plan to release the SFT checkpoint quite soon and I'll report back here when it's available 🤗

Awesome, thanks! 🤗

lewtun commented 10 months ago

Hi @liutianlin0121 we've now released the two SFT checkpoints behind zephyr-7b-alpha and zephyr-7b-beta and you can find them in this collection under mistral-7b-sft-{alpha,beta}: https://huggingface.co/collections/HuggingFaceH4/zephyr-7b-6538c6d6d5ddd1cbb1744a66

liutianlin0121 commented 10 months ago

Many thanks!! 🤗