togethercomputer / RedPajama-Data

The RedPajama-Data repository contains code for preparing large datasets for training large language models.
Apache License 2.0
4.53k stars 346 forks source link

Any forecast for the realese of v2 of the dataset ? #49

Closed vince62s closed 1 month ago

vince62s commented 1 year ago

thanks.