Open xzuyn opened 1 year ago
Cerebras Blog Post
Hugging Face Dataset
It's RedPajama, but deduplicated down to 672B tokens using MinHashLSH. May be worth using instead as it should be more compute efficient.
Cerebras Blog Post
Hugging Face Dataset
It's RedPajama, but deduplicated down to 672B tokens using MinHashLSH. May be worth using instead as it should be more compute efficient.