hedyorg / hedy

Hedy is a gradual programming language to teach children programming. Gradual languages use different language levels, where each level adds new concepts and syntactic complexity. At the end of the Hedy level sequence, kids master a subset of syntactically valid Python.
https://www.hedy.org
European Union Public License 1.2
1.3k stars 285 forks source link

Support for ICU MessageFormat interpolation #3658

Open KovacsGG opened 1 year ago

KovacsGG commented 1 year ago

This change would enable more powerful features for translations.

Here is an illustration:

You used an {echo} before an {ask}, or an {echo} without an {ask}. Place an {ask} before the {echo}.

might turn into

You used an echo before an question, or an echo without an question. Place an question before the echo.

because (in this example) users can switch between echo<->echo and ask<->question keyword sets.

This refactoring will open the door to translators to use this string as a translation:

You used {echo, select, echo {an} other {a}} {echo} before {ask, select, ask {an} other {a}} {ask}, or {echo, select, echo {an} other {a}} {echo} without {ask, select, ask {an} other {a}} {ask}. Place {ask, select, ask {an} other {a}} {ask} before the {echo}.

With arguments {ask: "ask", echo: "echo"} this would be interpolated in the same, correct way, but with {ask: "question", echo: "echo"} it would become

You used an echo before a question, or an echo without a question. Place a question before the echo.

This is an English example, which makes it a bit nonsensical, but especially in languages with grammatical gender, this issue is easier to run into and harder to circumvent by careful phrasing.

I'll discuss my implementation under a PR.

Mark-Giesen commented 1 year ago

I'd like to discuss this a little further. A lot of translation issues are handled by Weblate for us. The English yaml is usually created by us, the others by translating in Weblate. Things like plurals are handled there, see for instance this link: https://docs.weblate.org/en/latest/user/translating.html#plurals. We're not yet using that, but we could. The example you state here could happen indeed, because we just swap the keywords at runtime, users can even choose to swap or not. I don't think we can ask our translators (who are not programmers) to enter the right phrases like: "{echo, select, echo {an} other {a}}" for their language. Their might be a way to automate this combining our code and the strength of Weblate (all plural rules are implemented there all ready), but I'm not sure about this yet.

KovacsGG commented 1 year ago

I don't think we can ask our translators (who are not programmers) to enter the right phrases like: "{echo, select, echo {an} other {a}}" for their language.

I agree, it's a bit of a mouthful. There are only a few guides online aimed at translators, so one would have to dig for them, too. (https://simpleen.io/blog/icu-message-format-guide) There is some support from weblate for them though: https://docs.weblate.org/en/latest/user/checks.html#icu-messageformat. There was going to be a live preview/editor feature as well, but it seems like work on that has gone cold. :(

There is also a recommendation that these complex messages should be the outermost layers of a message, containing whole sentences. So it could be:

{echo, select,
  echo {{ask, select,
    ask {You used an echo before an ask, or an echo without an ask. Place an ask before the echo.}
    other {You used an echo before a question, or an echo without a question. Place a question before the echo.}
  }}
  other {{ask, select,
    ask {You used an echo before an ask, or an echo without an ask. Place an ask before the echo.}
    other {You used an echo before a question, or an echo without a question. Place a question before the echo.}
  }}
}

This way, translators don't have to know how to move parts of the message in and out of the arguments.

But keep in mind, that this is an extensions of the current syntax, so aside from special cases, most translators could just ignore it. This clashes with the above recommendation, because this one would leave the source strings alone, so only some languages have to deal with this syntax. The select format solves issues not solved by plurals.

While doing some reading for this post, I've also seen someone say that the ICU plural format doesn't support all languages, because you can't set pluralization rules, but looking at this (https://unicode-org.github.io/cldr-staging/charts/37/supplemental/language_plural_rules.html#rules) I have a susplicion that it just knows. (The API asks for a context locale.) This would be important to check, because while the ICU syntax is compatible with the current syntax, it's not with the weblate pluralization you linked. I think it's very likely a case of one or the other.

An icu pluralization looks like:

{members, plural,
  one {There is one member available.}
  other {There are # members available.}
}

for English.

For Arabic, it'd be something like:

{members, plural,
  zero {There are # members available.}
  one {There is one member available.}
  two {There are # members available.}
  few {There are # members available.}
  other {There are # members available.}
}

Again, plurals solcve a different problem than select.