unicode-org / message-format-wg

Developing a standard for localizable message strings
Other
237 stars 34 forks source link

[FEEDBACK] Implementing `u:dir` is a little bit tricky #918

Open eemeli opened 2 weeks ago

eemeli commented 2 weeks ago

This issue isn't anything actionable on the spec, I just wanted to highlight this matter to other implementers. Hence, ping @catamorphism, @mihnita, @lucacasonato, @janishorsts.

In short, we have two directionalities that may be set implicitly or explicitly:

These have an effect on our handling of bidirectional text, which includes this requirement:

Implementations MUST provide the Default Bidi Strategy as one of the bidirectional isolation strategies.

When you get around to implementing this, there are a few aspects that it took me a while to puzzle through, and I hope that pointing these out may save you some time:

  1. Unless you're implementing some more complicated bidi strategies than the default one and the 'none' strategy that's added to the test suite in #917, the only place where the message directionality actually matters is that if it's LTR and a placeholder has LTR directionality, then it doesn't need to be isolated.

  2. While the u:locale override definitely matters for the formatted value of a placeholder, the u:dir doesn't. It's only overriding the directionality of the placeholder in the context of the message.

  3. We have not defined well how the directionality of placeholders with standard annotations like :number or :datetime is determined. I opted to detect it from the locale, unless overridden by u:dir.

  4. Your function context and resolved value will need directionality indicators with four possible values: LTR, RTL, auto, and undefined (because an expression with u:dir=auto can be in a variable declaration that's then used as an operand). For formatted parts (if implementing), auto and undefined can collapse into a single value.

  5. With the default bidi strategy, a message like Hello {world} gets formatted in English as 'Hello \u2068world\u2069' (i.e. with FSI/PDI isolates), unless you introspect the value and discover that it's definitely LTR. You may be tempted therefore to do that introspection. Before you do, consider whether that makes any sense: Aren't you just front-loading work that would get done by the rendering engine in any case? And isn't your rendering engine rather likely to be well optimized for exactly that task? Don't be like me, as I wrote a hacky version of UAX#9 P2 in JS (which doesn't support \p{Bidi_Class=...} in regexps) before I realised that I don't need one, and threw it away.

aphillips commented 2 weeks ago

I opted to detect it from the locale, unless overridden by u:dir.

Note that this might have bearing later on messages such as Hello {world}, in which the message locale (or the override in a message like Hello {world u:locale=ar}!!) might reasonably result in a direction other than auto/undefined and thus produce different wrapping behavior.

aphillips commented 2 weeks ago

You may be tempted therefore to do that introspection. Before you do, consider whether that makes any sense: Aren't you just front-loading work that would get done by the rendering engine in any case?

+1 to this. Moreover: if you're doing the introspection, you might as well just do auto, because your results won't be meaningfully different (except for bugs in your own code).

It's really hard to determine that something is "definitely RTL". This example:

السعر {⁦$price :currency symbol=iso⁩} + {⁦$shipping :currency symbol=iso⁩} الشحن

The expressions both evaluate as LTR when formatted (e.g. 1,234.56 USD and 13.45 USD) but they are RTL in an RTL locale. We want them RLI/PDI isolated so that the currency symbol renders in the trailing position:

السعر ⁧1,234.56 USD⁩ + ⁧13.45 USD⁩ الشحن

... which I accomplish here using RLI/PDI