vdobrovolskii / rucoco

29 stars 13 forks source link

NewsRu corpus #13

Closed akutuzov closed 2 years ago

akutuzov commented 2 years ago

Hi, You sampled texts for annotation from NewsRu.com. Thus, I assume you crawled the full website and have access to its texts in some convenient format.

It would be really great if you publish this "NEwsRu corpus" in full, with as much metadata as possible. As far as I understand, there are no legal obstacles to that: all NewsRu texts are available under Creative Commons Attribution 4.0 International license.

Thanks in advance!

vdobrovolskii commented 2 years ago

Hi!

Thanks for your interest. Here you can find the text data that we used while building the corpus. Each file contains the text of the news article (utf8, LF) + a hyperlink to the source.

Best regards, Vladimir

akutuzov commented 2 years ago

Great, thanks a lot!