unicode-org / message-format-wg

Developing a standard for localizable message strings
Other
229 stars 33 forks source link

Should we really be using `{{pattern}}` and `|literal|` delimiters? #602

Open eemeli opened 8 months ago

eemeli commented 8 months ago

The syntax of MessageFormat 2 is the result of a long chain of discussions, arguments, compromises, and the balancing of multiple different stakeholders and concerns. While it is quite capable of fulfilling the demands put upon it, it is literally a design by committee.

While I strongly support our work and our results, I remain concerned that the design decisions we've made specifically about our {{pattern}} and |literal| delimiters, and how weird they are. We have, quite explicitly, ended up choosing string delimiters that are not commonly used as string delimiters, so that embedding MF2 strings within programming languages or JSON does not require internal escapes, and to reduce the frequency of message contents needing to include escapes.

To rationalise our decisions, we have multiple overlapping design documents tracing our path to where we are now; documents that we've argued about and sometimes voted on to unblock our progress. As far as I know, we do not have a single succinct document explaining why these delimiters are the way they are.

As we are now approaching a complete definition of the language and publishing it as a tech preview, I think the delimiters are a specific concern that we ought to be ready to accept some criticism about, and to potentially reconsider for our final release. The base assumptions that I believe we may have mis-estimated include:

Finally, to illustrate what this is all about, consider this MF2 message, using our current syntax:

.input {$count :number}
.local $kind = {|"Granny Smith"|}
.match {$count}
0 {{no {$kind} apples}}
one {{{$count} {$kind} apple}}
* {{{$count} {$kind} apples}}

If we were to allow for more normal pattern and literal delimiters, this same message could read as:

.input {$count :number}
.local $kind = {'"Granny Smith"'}
.match {$count}
0 "no {$kind} apples"
one "{$count} {$kind} apple"
* "{$count} {$kind} apples"

While I appreciate that the alternative syntax would carry some costs, I believe that its benefits in readability and lack of weirdness outweigh the negatives. Therefore, I ask that we be open to discussing these choices further during the tech review phase.

aphillips commented 8 months ago

(chair hat on)

@eemeli was asked to file this issue following discussion in the 2024-01-15 teleconference. In that call, we explicitly discussed that this is out-of-scope for LDML45. The MFWG will not consider any further normative preferential changes to the ABNF or syntax in this release. Only editorial ("cleanup") or technical errors ("bugs") within the current design will be considered in this release.

This comment is strictly to document that fact. It is neither an endorsement nor a rejection of this issue.

mihnita commented 8 months ago

My take on this


How common it is to include MF2 messages in a programming language or other context where specific delimiters are required of strings, and alternatives are not available

Very often. And it is not only about programming languages, but also file formats.

There are many formats that delimit their own messages with ", or require " to be escaped.

So 4 of the most common file formats explicitly designed for localization use ", with not alternative.


How difficult it is to manually escape string delimiter characters

Let's take this:

.match ($button :string)
subscribe {{Click "Subscribe" to stop receiving emails}}
unsubscribe {{Click "Subscribe" to ...}}

If we replace {{...}} with quotes in our syntax now this becomes

.match ($button :string)
subscribe "Click \"Subscribe\" to stop receiving emails"
unsubscribe "Click \"Subscribe\" to ..."

And next we store the message in code / json / etc:

{
"msg": ".match ($button :string) subscribe \"Click \\\"Subscribe\\\" to stop receiving emails\" unsubscribe \"Click \\\"Subscribe\\\" to ...\""
}

What is the appropriate lesson to take from ICU MessageFormat's choices to use ' as an escape character

That it is a bad idea to require escaping for characters commonly used in the body of localized messages, and that WYSIWYG is best.