Open macchiati opened 1 week ago
It's quite intentional that the default bidi strategy does not allow for optimizations, as it's meant to produce the same output in different implementations. This enables e.g. rehydration to work well by having server and client code produce the exact same output. We do allow for other strategies or variants to be provided, which may perform any such optimizations: https://github.com/unicode-org/message-format-wg/blob/849db9c30ef8f7ffe68a62e297738193eda6bd48/spec/formatting.md?plain=1#L924-L927
Note that we're not defining any explicit HTML or other non-string formatting output in the spec. We got somewhat close to defining a formatted-parts output, but ultimately decided not to define it here (it is defined in the JS spec, though). Therefore, to enable properly isolated HTML to be produced from MF2, we have at least this: https://github.com/unicode-org/message-format-wg/blob/849db9c30ef8f7ffe68a62e297738193eda6bd48/spec/formatting.md?plain=1#L894-L896
Agreed, some of these references should be clarified a bit.
Re: changing literal text, that sounds like something that ought to be done as post-processing to the MF2 output. After this was discussed earlier, I ended up implementing a PoC hackyFixArticles function in the JS messageformat
test suite that applies this correction, to show how it could be done with formatted parts.
It's quite intentional that the default bidi strategy does not allow for optimizations, as it's meant to produce the same output in different implementations.
But there is no guarantee that two different implementations will produce the same result for almost any placeholder with a function. So bidi "compatibility" would not at all guarantee "the same output in different implementations". So that forces implementations to have an option to produce the 'heavy' version of bidi control insertion, even if what most clients will want is the 'light' version (which produces the same results).
As for the HTML and literal text, the main point is that an implementation's MF2 APIs should be able to have options for those. So we need to make sure that the spec doesn't exclude that.
We're not looking to guarantee the same results, but to enable them. If there's a way for a user to get the exact same bidi isolation with two different implementations and at least one of them allows for its function handlers to be user-customizable, it becomes possible to have the same function handler behaviour in both implementations, and for the outputs to match.
Also, we do include this directive: https://github.com/unicode-org/message-format-wg/blob/ec9089dd85cd5d8dc1f68807ab3551383bd74965/spec/formatting.md?plain=1#L828-L829
Following that, it should not matter if the output includes more isolation than strictly necessary.
As for the HTML and literal text, the main point is that an implementation's MF2 APIs should be able to have options for those. So we need to make sure that the spec doesn't exclude that.
The spec does not exclude those possibilities. We explicitly call out potential support for not only HTML syntax, but also DOM fragments, and we do not establish any upper bound for what the formatted output might look like or what transforms could be applied to it.
We had a discussion around the implications of bidi isolation, in formatting.md. I'm capturing some items here for post 46.1
A related issue: in #formatting it should make it clear that callers of implementations cannot rely on the literal text in a pattern being preserved in the formatted pattern. That is, an implementation could change the literal text, such as improving the result of {{You have an {$item} in your basket.}} based on the value of $item, eg "You have an apple in your basket." vs "You have a pear in your basket.".