The evaluation of input_quality

magpie-align / magpie

Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality synthetic data generation pipeline!

MIT License

418 stars 43 forks source link

Hi, Thanks for your valuable suggestions! You are right that if humans write the instructions, we should not filter the unclear user instructions. However, when we were generating the synthetic dataset with high temperatures, we found that occasionally the model would output contents with no sense (e.g., a message that consists of multiple languages and/or symbols we cannot understand). Therefore, we apply the quality filter.

Empirically, we also found that applying a quality filter can increase the model's performance. We also provided the raw data here so feel free to design your own filter!

magpie-align / magpie

The evaluation of input_quality #11