thammegowda / mtdata

A tool that locates, downloads, and extracts machine translation corpora
https://pypi.org/project/mtdata/
Apache License 2.0
147 stars 22 forks source link

JW300 taken down from OPUS #77

Open kpu opened 3 years ago

kpu commented 3 years ago

Something related to copyright, they are trying to get proper permission.

thammegowda commented 3 years ago

It has been such a valuable resource! https://jw.org has 1000 languages https://glosbe.com has 6000 languages (most of them are dictionaries, but there are 2B+ sentence pairs) If these two allow us to crawl their site text for NLP research+apps ....

patelrajnath commented 2 years ago

Do we have any idea if it will back online anytime soon?

Hammyhamm89 commented 1 year ago

sorry for replying to this extremely old thread, but i just heard of this dataset, i'm sad it's down. i wish there was a backup somewhere.