issues
search
thammegowda
/
mtdata
A tool that locates, downloads, and extracts machine translation corpora
https://pypi.org/project/mtdata/
Apache License 2.0
147
stars
22
forks
source link
[WIP] v0.3.5
#110
Closed
thammegowda
closed
2 years ago
thammegowda
commented
2 years ago
Update OPUS index. Use OPUS API to download all datasets
A lot of new datasets added. Closes #109
Fix: JESC dataset language IDs were wrong
New datasets:
paracrawlv3 and TED datasets for jpn-eng
Option to set
MTDATA_RECIPES
dir (default is $PWD). All files matching the glob
${MTDATA_RECIPES}/mtdata.recipes*.yml
are loaded
WMT22 datasets and recipes added
MTDATA_RECIPES
dir (default is $PWD). All files matching the glob${MTDATA_RECIPES}/mtdata.recipes*.yml
are loaded