unicode-org / message-format-wg

Developing a standard for localizable message strings
Other
209 stars 32 forks source link

[FEEDBACK] Isolating quoted patterns on the outside adds a lookahead to the syntax #787

Open eemeli opened 1 month ago

eemeli commented 1 month ago

An observation from implementing bidi isolation as proposed in #781, but which also applies to the currently proposed design for bidi usability:

Isolating quoted patterns on the outside adds LRI, RLI & FSI to the set of characters (currently { and .) that could start a quoted message with no declarations, as in \u2066{{hello}}\u2069.

This doesn't make the syntax ambiguous as the {{ isn't valid in a simple-message, but it does add a lookahead of one token to the parser.

The same lookahead is also required in variant, to determine whether a \u2066 starts a quoted key, or a quoted-pattern.

The simplest change to avoid this lookahead would probably be to place the open-isolate and close-isolate between the braces, as in {\u2066{hello}\u2069}. In this position, it would also match what's proposed for expression and markup.

aphillips commented 1 month ago

Putting the isolate between the pattern quotes would mean that there are two sequences for opening/closing. And it is harder for tools to insert (or remove) the isolates. It's cognitive burden on everyone, although admittedly it's clever.

Note that the isolates (unless inside of a literal) are ignorable and can be stripped from the message.

eemeli commented 1 month ago

Putting the isolate between the pattern quotes would mean that there are two sequences for opening/closing.

This is also the case with isolates outside the quotes. The current proposal has:

I'm suggesting that we instead use

And it is harder for tools to insert (or remove) the isolates.

Both solutions are just as easy or hard to deal with. As MF2 may include e.g. |{{}}| as a valid quoted literal, a proper MF2 parser is required to apply any such changes.

aphillips commented 1 month ago

I think the difference is (especially if we make the pairing optional!) that the open and close isolates can just be ignored in the current design. With optional pairing, we can push the isolate characters back into the s production. Anyway, let's discuss.