thammegowda / mtdata

A tool that locates, downloads, and extracts machine translation corpora
https://pypi.org/project/mtdata/
Apache License 2.0
147 stars 22 forks source link

Add parallel bible corpus #80

Open thammegowda opened 2 years ago

thammegowda commented 2 years ago

First appeared here in https://aclanthology.org/L14-1215/

which references link: http://paralleltext.info/data/

but that link is no longer available.

However, recently https://arxiv.org/pdf/2109.05772.pdf mention that they used it. In their own words:

The Parallel Bible Corpus (PBC) by Mayer and Cysouw (2014) is a multi-parallel corpus spanning 1259 languages and up to 30k verses per translation.

TODO: find a download link and include in our index

thammegowda commented 2 years ago

I learned via my connections in the bible translation community that bible translations are copyrighted. So we are probably not going to find bible translations for low res langs at the moment. Tagging this as invalid for now, until someone makes bible translations open.