hplt-project / OpusCleaner

OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.
https://pypi.org/project/opuscleaner/
46 stars 13 forks source link

Store download urls in opuscleaner configuration files #76

Open jelmervdl opened 1 year ago

jelmervdl commented 1 year ago

Imagine: you can just copy the cleaning configuration files to your directory, or get them from your git repository, and run a command, and that will download the data to the machine.

… that would make repeatable pipelines a little easier, right?

The download code is already in download.py, it just isn't accessible from the cli.