hplt-project / OpusCleaner

OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.
https://pypi.org/project/opuscleaner/
48 stars 13 forks source link

CLI to deal with data categories #84

Open jelmervdl opened 1 year ago

jelmervdl commented 1 year ago
graemenail commented 1 year ago

So we don't have to do this:

python -c 'import json; ds=json.load(open("../../bitextor-mt-models/hin-eng/data/raw/categories.json"))["mapping"]["clean"]; print(" ".join(ds))'
jelmervdl commented 1 year ago
jq --raw-output .mapping.clean[] <categories.json