tldr-pages / tldr-translation-pairs-gen

Generates a structured dataset in various formats derived from tldr-pages.
https://opus.nlpl.eu/tldr-pages/corpus/version/tldr-pages
MIT License
4 stars 3 forks source link

feat: allow multiple export formats in a single process #48

Open SethFalco opened 5 months ago

SethFalco commented 5 months ago

The --format argument currently only accepts 1 format at a time. In the scenario that we want to produce multiple datasets, this means redundantly processing the files multiple times.

To reduce wasted computation/energy, we should support either multiple formats or an all that processes the files once, but exports in all formats specified.