thammegowda / mtdata

A tool that locates, downloads, and extracts machine translation corpora
https://pypi.org/project/mtdata/
Apache License 2.0
147 stars 22 forks source link

Add MaCoCu corpora #128

Open ZJaume opened 2 years ago

ZJaume commented 2 years ago

List of available pairs:

English-Spanish and English-Dutch are Paracrawl 9 enriched DSI (domain) data, so there's no need to add them. More languages will come next year (Albanian, Serbian, Montenegrin and Bosnian).

EDIT: forgot the link macocu.eu