unicode-org / message-format-wg

Developing a standard for localizable message strings
Other
236 stars 34 forks source link

[FEEDBACK] bidi clarifications #958

Open macchiati opened 1 week ago

macchiati commented 1 week ago

We had a discussion around the implications of bidi isolation, in formatting.md. I'm capturing some items here for post 46.1

  1. General issue. While it is clear that Default Bidi Strategy is the correct strategy in the general case, the algorithm seems to preclude various optimizations. It should make it clear that an implementation is conformant to the DBS if the bidi ordering it produces for any formatted pattern is identical to the bidi ordering produced for that pattern by the DBS.
  2. It should also make clear than an implementation may generate equivalent results for other environments. Eg, when generating HTML, it could produce bidi markdown instead of the Unicode bidi control characters. This might be covered by #formatting, but best to have it specifically called out.
  3. Details
    1. In #handling-bidirectional-text we have "Let msgdir be the directionality of the whole message," but it is not defined.
    2. In #formatting-context we have "Information on the base directionality of the message and its text tokens. This will be used by strategies for bidirectional isolation, and can be used to set the base direction of the message upon display." That sounds like it is supposed to define msgdir — or at least the connection between them needs to be clear.
    3. "Let fmt be the formatted string representation of the resolved value of exp." should be "Let fmt be the formatted string representation of exp."

A related issue: in #formatting it should make it clear that callers of implementations cannot rely on the literal text in a pattern being preserved in the formatted pattern. That is, an implementation could change the literal text, such as improving the result of {{You have an {$item} in your basket.}} based on the value of $item, eg "You have an apple in your basket." vs "You have a pear in your basket.".

eemeli commented 6 days ago
  1. It's quite intentional that the default bidi strategy does not allow for optimizations, as it's meant to produce the same output in different implementations. This enables e.g. rehydration to work well by having server and client code produce the exact same output. We do allow for other strategies or variants to be provided, which may perform any such optimizations: https://github.com/unicode-org/message-format-wg/blob/849db9c30ef8f7ffe68a62e297738193eda6bd48/spec/formatting.md?plain=1#L924-L927

  2. Note that we're not defining any explicit HTML or other non-string formatting output in the spec. We got somewhat close to defining a formatted-parts output, but ultimately decided not to define it here (it is defined in the JS spec, though). Therefore, to enable properly isolated HTML to be produced from MF2, we have at least this: https://github.com/unicode-org/message-format-wg/blob/849db9c30ef8f7ffe68a62e297738193eda6bd48/spec/formatting.md?plain=1#L894-L896

  3. Agreed, some of these references should be clarified a bit.

Re: changing literal text, that sounds like something that ought to be done as post-processing to the MF2 output. After this was discussed earlier, I ended up implementing a PoC hackyFixArticles function in the JS messageformat test suite that applies this correction, to show how it could be done with formatted parts.

macchiati commented 5 days ago

It's quite intentional that the default bidi strategy does not allow for optimizations, as it's meant to produce the same output in different implementations.

But there is no guarantee that two different implementations will produce the same result for almost any placeholder with a function. So bidi "compatibility" would not at all guarantee "the same output in different implementations". So that forces implementations to have an option to produce the 'heavy' version of bidi control insertion, even if what most clients will want is the 'light' version (which produces the same results).

As for the HTML and literal text, the main point is that an implementation's MF2 APIs should be able to have options for those. So we need to make sure that the spec doesn't exclude that.

eemeli commented 5 days ago

We're not looking to guarantee the same results, but to enable them. If there's a way for a user to get the exact same bidi isolation with two different implementations and at least one of them allows for its function handlers to be user-customizable, it becomes possible to have the same function handler behaviour in both implementations, and for the outputs to match.

Also, we do include this directive: https://github.com/unicode-org/message-format-wg/blob/ec9089dd85cd5d8dc1f68807ab3551383bd74965/spec/formatting.md?plain=1#L828-L829

Following that, it should not matter if the output includes more isolation than strictly necessary.

As for the HTML and literal text, the main point is that an implementation's MF2 APIs should be able to have options for those. So we need to make sure that the spec doesn't exclude that.

The spec does not exclude those possibilities. We explicitly call out potential support for not only HTML syntax, but also DOM fragments, and we do not establish any upper bound for what the formatted output might look like or what transforms could be applied to it.