togethercomputer / RedPajama-Data

The RedPajama-Data repository contains code for preparing large datasets for training large language models.
Apache License 2.0
4.43k stars 335 forks source link

New Features #76

Open zhangce opened 8 months ago

zhangce commented 8 months ago

Thanks visarga for the suggestion:

https://news.ycombinator.com/item?id=38077521