vslavik / poedit

Translations editor for Mac, Windows and Unix
https://poedit.net
MIT License
1.79k stars 278 forks source link

poedit generates invalid unexpected characters #866

Open CyrilleB79 opened 1 month ago

CyrilleB79 commented 1 month ago

I am trying to translate locale\en to locale\fr in a xliff file. This cause poedit to generate a file that is not accepted by crowdin.

Here is the file: changes_NOK.zip See line 5777: the "\f" is interpreted as .

For reference, the file where I have left the translation to locale\en is here: changes_OK.zip

CyrilleB79 commented 1 month ago

Cc @michaelDCurran, responsible for NVDA's translation framework upgrade where this issue has been noticed.

michaelDCurran commented 1 month ago

It looks like that input field in poedit excepts \ as an escape chracter, and generates control codes for sequences such as \f \t etc. For instance, typing \t results in a physical tab character in the saved po or xliff file. Typing a \\ (double back slash) results in the desired behaviour of producing one back slash. So in your example, in Poedit, you'll have to type locale\\fr. I also made a contrived testcase where the source string was c:\folder and tested how Poedit copied the source as the initial translation with control+b. In this case, it resulted in c:\\folder being set in the input field, which is what I would expect based on the above. So in short, the expectation is that translators need to escape any literal back slash characters in that input field. Though as that is a richEdit50 control, I wonder whether it might be better for translators if tab / newline / formfeed were just handled visually, which would remove the need for back slash escapes entirely.

vslavik commented 1 month ago

As @michaelDCurran says, this behaves as intended. Special characters and placeholders are visually indicated in the translation field; in your example, \f was highlighted as a single unit, implying its special meaning.

The actual bug is that \ is not escaped in the source text field.

I do wonder if the highlighting is conveyed by screenreaders and if not (as I suspect), how to indicate placeholders in a11y context, though...

Though as that is a richEdit50 control, I wonder whether it might be better for translators if tab / newline / formfeed were just handled visually,

That would be ideal, yes — highlight newlines and strange whitespace with typographic marks that are not verbarim part of the edited text. In addition to reducing confusion, it would greatly simplify the code.

Last time I checked this, I knew how to do this on macOS, but not on Windows. It sounds like you know this is doable, could you give me some pointers on how?