huggingface / smollm

Everything about the SmolLM & SmolLM2 family of models
https://huggingface.co/HuggingFaceTB
Apache License 2.0
1.09k stars 42 forks source link

Add `magpie-ultra-v1.0` distilabel pipeline #4

Closed gabrielmbmb closed 6 days ago

gabrielmbmb commented 6 days ago

Description

This PR adds the code for the distilabel pipeline that was used to generate magpie-ultra-v1.0 dataset.