Closed dorost1234 closed 3 years ago
Name: ccnet
Description: Common Crawl
Paper: https://arxiv.org/abs/1911.00359
Data: https://github.com/facebookresearch/cc_net
Motivation: this is one of the most comprehensive clean monolingual datasets across a variety of languages. Quite important for cross-lingual reseach
Instructions to add a new dataset can be found here.
thanks
closing since I think this is cc100, just the name has been changed. thanks
Adding a Dataset
Name: ccnet
Description: Common Crawl
Paper: https://arxiv.org/abs/1911.00359
Data: https://github.com/facebookresearch/cc_net
Motivation: this is one of the most comprehensive clean monolingual datasets across a variety of languages. Quite important for cross-lingual reseach
Instructions to add a new dataset can be found here.
thanks