Add Flores200 dev and devtests #145. Thanks @ZenBel
Add support for mtdata echo <ID>
dataset entries only store bibtext keys and not full citation text
creates index cache as JSONLine file. (WIP towards dataset statistics)
Simplified index loading
simplified compression format handlers. Added support for opening .bz2 files without creating temp files.
all resources are moved to mtdata/resource dir and any new additions to that dir are automatically included in python package (Fail proof for future issues like #137 )
New and exciting features:
Support for adding new datasets at runtime (mtdata*.py from run dir). Note: you have to reindex by calling mtdata -ri list
Monolingual datasets support in progress (currently testing)
Dataset IDs are now Group-name-version-lang1-lang2 for bitext and Group-name-version-lang for monolingual
mtdata list is updated. mtdata list -l eng-deu for bitext and mtdata list -l eng for monolingual
mtdata echo <ID>
mtdata/resource
dir and any new additions to that dir are automatically included in python package (Fail proof for future issues like #137 )New and exciting features:
mtdata*.py
from run dir). Note: you have to reindex by callingmtdata -ri list
Monolingual datasets support in progress (currently testing)
Group-name-version-lang1-lang2
for bitext andGroup-name-version-lang
for monolingualmtdata list
is updated.mtdata list -l eng-deu
for bitext andmtdata list -l eng
for monolingualDiscussions are welcome!
What other feature/addition would you like to see in the next version?