Incorrectly encoding text when backFormat is text

When performing backtranslation with file2brl, if the configuration has backFormat set to text and the text resulting from the backtranslation contains unicode characters outside the ASCII range these will be incorrectly encoded. As an example, using en-ueb-g2.ctb as the translation table try back translating a word containing an apostrophe (eg. I'M, CAN'T, etc). This results in the apostrophe being produced as the byte 0x19. Having tested file2brl with backFormat set to html, it appears that in this example the apostrophe gets backtranslated to unicode character \u2019. I therefore suspect file2brl is simply removing the higher byte of the unicode characters when backFormat is set to text.

liblouis / liblouisutdml

Incorrectly encoding text when backFormat is text #68