togethercomputer / RedPajama-Data

The RedPajama-Data repository contains code for preparing large datasets for training large language models.
Apache License 2.0
4.43k stars 335 forks source link

Q: Why does RePajama exist? what problem are you solving? #69

Open brando90 opened 10 months ago

brando90 commented 10 months ago

https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T/discussions/25

brando90 commented 10 months ago

https://discord.com/channels/1082503318624022589/1097534874719625236/1143265561753686176