tldr-pages / tldr-translation-pairs-gen

Generates a structured dataset in various formats derived from tldr-pages.
https://opus.nlpl.eu/tldr-pages/corpus/version/tldr-pages
MIT License
4 stars 3 forks source link

feat: add tmx support and change output #9

Closed SethFalco closed 1 year ago

SethFalco commented 1 year ago

This makes significant changes to the arguments and output of the project.

Adds TMX Support

There are a few benefits to this.

File per Language Combo

Before I'd only output a single file that contained the English to Xyz for every language. This has a few issues.

Related to the change above, the --output argument no longer takes a file. Instead, it takes a directory which it then populated with all generated files. This means users no longer have control over output file names, but this also means we don't have to care to assume guess file formats anymore which is nice imo.

Change XML Library

Before we used fast-xml-parser but I don't see a clear way to set attributes or namespaces etc with it. I've opted to switch to xmlbuilder2 which is very well documented and makes building the TMX file very apparent.