thammegowda / mtdata

A tool that locates, downloads, and extracts machine translation corpora
https://pypi.org/project/mtdata/
Apache License 2.0
147 stars 22 forks source link

Update Tatoeba corpus #153

Open jeanm opened 1 year ago

jeanm commented 1 year ago

Looks like a new version of the Tatoeba corpus was released on OPUS a while back, which contains about 1M new sentence fragments: https://opus.nlpl.eu/Tatoeba-v2023-04-12.php