The RedPajama-Data repository contains code for preparing large datasets for training large language models.
4.43k
stars
335
forks
source link
where should I go to get the file about "domain_to_category_id.json"? #87
Closed
suolyer closed 7 months ago
Hello, where should I go to get the file about "domain_to_category_id.json"?
https://github.com/togethercomputer/RedPajama-Data/blob/26c5417e2fc3391ff2e8b19ffcf5521e9ca8def8/app/src/core/quality_signals/utils/content.py#L8