This makes significant changes to the arguments and output of the project.
Adds TMX Support
There are a few benefits to this.
This format is designed for programmatically handling translation alignment.
It's one of the formats explicitly mentioned on OPUS.
File per Language Combo
Before I'd only output a single file that contained the English to Xyz for every language. This has a few issues.
Having a single file means it's not possible to download data for only specific languages. As our dataset grows, the single file might get excessively bloated.
We need more than just English to French for example, we want to output all the implicit alignments as well, i.e. if there's an English to French, and English to German, we can derive French to German, this should be in the exported dataset.
Related to the change above, the --output argument no longer takes a file. Instead, it takes a directory which it then populated with all generated files. This means users no longer have control over output file names, but this also means we don't have to care to assume guess file formats anymore which is nice imo.
Change XML Library
Before we used fast-xml-parser but I don't see a clear way to set attributes or namespaces etc with it. I've opted to switch to xmlbuilder2 which is very well documented and makes building the TMX file very apparent.
This makes significant changes to the arguments and output of the project.
Adds TMX Support
There are a few benefits to this.
File per Language Combo
Before I'd only output a single file that contained the English to Xyz for every language. This has a few issues.
Related to the change above, the
--output
argument no longer takes a file. Instead, it takes a directory which it then populated with all generated files. This means users no longer have control over output file names, but this also means we don't have to care to assume guess file formats anymore which is nice imo.Change XML Library
Before we used fast-xml-parser but I don't see a clear way to set attributes or namespaces etc with it. I've opted to switch to xmlbuilder2 which is very well documented and makes building the TMX file very apparent.