opening-up-chatgpt / opening-up-chatgpt.github.io

Tracking instruction-tuned LLM openness. Paper: Liesenfeld, Andreas, Alianda Lopez, and Mark Dingemanse. 2023. “Opening up ChatGPT: Tracking Openness, Transparency, and Accountability in Instruction-Tuned Text Generators.” In Proceedings of the 5th International Conference on Conversational User Interfaces. doi:10.1145/3571884.3604316.
https://opening-up-chatgpt.github.io/
Apache License 2.0
97 stars 7 forks source link

Add Stable Beluga 2 #67

Closed mdingemanse closed 11 months ago

mdingemanse commented 11 months ago

"Meet Stable Beluga 1 and Stable Beluga 2, Our Large and Mighty Instruction Fine-Tuned Language Models" https://stability.ai/blog/stable-beluga-large-instruction-fine-tuned-models

The training for the Stable Beluga models was directly inspired by the methodology pioneered by Microsoft in its paper: "Orca: Progressive Learning from Complex Explanation Traces of GPT-4.” While our data generation process is similar, we differ in our data sources.

Our variant of the dataset, containing 600,000 data points (roughly 10% of the dataset size the original Orca paper used), was created synthetically using high-quality instructions from the following datasets created by Enrico Shippole: