issues
search
thammegowda
/
mtdata
A tool that locates, downloads, and extracts machine translation corpora
https://pypi.org/project/mtdata/
Apache License 2.0
147
stars
22
forks
source link
v0.3.1
#79
Closed
thammegowda
closed
3 years ago
thammegowda
commented
3 years ago
Add support for recipes; list-recipe get-recipe subcommands added
add support for viewing stats of dataset; words, chars, segs
FIX URL for UN dev and test sets (source was updated so we updated too)
Multilingual experiment support; ISO 639-3 code
mul
implies multilingual; e.g. mul-eng or eng-mul
--dev
accepts multiple datasets, and merges them (useful for multilingual experiments)
tar files are extracted before reading (performance improvements) Closes #78
setup.py: version and descriptions accessed via regex
mul
implies multilingual; e.g. mul-eng or eng-mul--dev
accepts multiple datasets, and merges them (useful for multilingual experiments)