hhu-adam / lean-i18n

i18n library for Lean.
https://reservoir.lean-lang.org/@hhu-adam/i18n
Apache License 2.0
6 stars 0 forks source link

use and documentation of variable syntax #8

Open TentativeConvert opened 2 weeks ago

TentativeConvert commented 2 weeks ago

It seems that entries in {…} in a lean file get parsed as variables. For example, the line

Robo: Hier bedeutet {A} : Prop wieder, dass {A} irgendeine Aussage ist.

in Robo/Logos/L04

becomes

msgid "Robo: Hier bedeutet «{A}» : Prop wieder, dass «{A}» irgendeine Aussage ist.\n"

in the .pot file, and in Poedit, the {A}s in this line (and any copy of {A} in the translation) are highlighted in a different colour.

Questions:

  1. Should the special characters « and » be included in the translation?
  2. Wouldn't we rather want the whole sequence A : Prop to be treated as a variable, to avoid translations messing up the lean syntax? (DeepL turned the sequence into "{A}" prop. There are many more convoluted examples where a human translator could easily introduce a mistake by accident, or overlook a mistake introduced by a machine pre-translation.) How could this be achieved?
joneugster commented 2 weeks ago
  1. The french quotes in the translation are a manual hack by me (basically because there needs to be some sort of escaping, and I chose this back then because it is unique), they need to appear in the translared string as they are in the untranslated one.

Possibly one could revisit his and choose a better escaping.

  1. I don't think that's reasonable, as the main purpose of {h} is to substitute the fvarid h with the name the user chose instead. Any other syntax, such as "{1 + 1}" only works by chance and not by design. I think this change would complicate the name-swapping logic without too much gain
TentativeConvert commented 2 weeks ago

I cannot quite follow. If «{…}» is supposed to be an escape sequence for {…}, then it does not seem to work, because when Poedit reads «{…}», it still treats {…} in a special way. In the current setup,

        (1)     Please try `exact {h}`.                  in the game
becomes (2)     Please try `exact «{h}»`.                in the pot-file
becomes (3)     Probier einmal `genau "{h}"`.            in the po-file, via machine translation,
becomes (4)     Probier einmal `genau "{h}"`.            in the translated game.

I’m asking whether we could have a setup where

        (1)     Please try `exact {h}`.                  in the game
becomes (2)     Please try {exact {h}}.                  in the pot-file
becomes (3)     Probier einmal {exact {h}}.              in the po-file, via machine translation,
becomes (4)     Probier einmal `exact {h}`.              in the translated game.

Possibly the inner {…} in (2) would indeed need to be escaped. The outer {…} should ensure that the machine translation does not touch anything in between.

I’m assuming here that we use `…` only to mark up lean code in the game files. I hope we're not also using `…` to mark up natural language in some places? (I guess though there might be completely unrelated use cases of lean-i18n where translation of `…` strings is desired.)

joneugster commented 2 weeks ago

step 3 to 4 of the second example might be problematic as it might not be directly invertable. And is it not the case that `...` alread, marks any non-translated bits of code? why would it need to be {...} instead? Is there a feature in POedit which marks sub-text that should be left alone? If so, could you please link that spec if you've already found it? Im not that fmailiar with POedit.

Indeed, since the french quotes get translated incorrectly, I'll search where they were used as a hack and see what else could be done👍

TentativeConvert commented 2 weeks ago

Sorry, should have done my homework first. Indeed, I cannot find any syntax for marking sub-strings as untranslatable in pot-files. The documentation of gettext suggests that any non-translatable sub-strings should be replaced by format strings, the exact syntax of which will depend on the programming language. Then the programming language used needs to be included as a flag in the pot-file.

Similarly, DeepL does not appear to accept any such syntax. This example from the official documentation again suggests using specific format strings as a work-around. I thought Probier einmal <code>exact {h}</code>! could work, but it’s equally unreliable as Probier einmal `exact {h}`! or Probier einmal {exact {h}}!.

TentativeConvert commented 2 weeks ago

The problem is that this really makes a semi-automated approach to translation (pre-translate with a machine and then correct manually for style) quite painful, because one has to carefully go over every lean snippet for messed up characters.