natasha / corus

Links to Russian corpora + Python functions for loading and parsing
MIT License
281 stars 20 forks source link

CC News #24

Open kuk opened 4 years ago

kuk commented 4 years ago

https://commoncrawl.org/2016/10/news-dataset-available/

3-4% of news pages in .ru domain