Open eemeli opened 8 months ago
(chair hat on)
@eemeli was asked to file this issue following discussion in the 2024-01-15 teleconference. In that call, we explicitly discussed that this is out-of-scope for LDML45
. The MFWG will not consider any further normative preferential changes to the ABNF or syntax in this release. Only editorial ("cleanup") or technical errors ("bugs") within the current design will be considered in this release.
This comment is strictly to document that fact. It is neither an endorsement nor a rejection of this issue.
My take on this
How common it is to include MF2 messages in a programming language or other context where specific delimiters are required of strings, and alternatives are not available
Very often. And it is not only about programming languages, but also file formats.
There are many formats that delimit their own messages with "
, or require "
to be escaped.
So 4 of the most common file formats explicitly designed for localization use "
, with not alternative.
How difficult it is to manually escape string delimiter characters
Let's take this:
.match ($button :string)
subscribe {{Click "Subscribe" to stop receiving emails}}
unsubscribe {{Click "Subscribe" to ...}}
If we replace {{...}}
with quotes in our syntax now this becomes
.match ($button :string)
subscribe "Click \"Subscribe\" to stop receiving emails"
unsubscribe "Click \"Subscribe\" to ..."
And next we store the message in code / json / etc:
{
"msg": ".match ($button :string) subscribe \"Click \\\"Subscribe\\\" to stop receiving emails\" unsubscribe \"Click \\\"Subscribe\\\" to ...\""
}
What is the appropriate lesson to take from ICU MessageFormat's choices to use ' as an escape character
That it is a bad idea to require escaping for characters commonly used in the body of localized messages, and that WYSIWYG is best.
The syntax of MessageFormat 2 is the result of a long chain of discussions, arguments, compromises, and the balancing of multiple different stakeholders and concerns. While it is quite capable of fulfilling the demands put upon it, it is literally a design by committee.
While I strongly support our work and our results, I remain concerned that the design decisions we've made specifically about our
{{pattern}}
and|literal|
delimiters, and how weird they are. We have, quite explicitly, ended up choosing string delimiters that are not commonly used as string delimiters, so that embedding MF2 strings within programming languages or JSON does not require internal escapes, and to reduce the frequency of message contents needing to include escapes.To rationalise our decisions, we have multiple overlapping design documents tracing our path to where we are now; documents that we've argued about and sometimes voted on to unblock our progress. As far as I know, we do not have a single succinct document explaining why these delimiters are the way they are.
As we are now approaching a complete definition of the language and publishing it as a tech preview, I think the delimiters are a specific concern that we ought to be ready to accept some criticism about, and to potentially reconsider for our final release. The base assumptions that I believe we may have mis-estimated include:
"
would arise. Many programming languages only support multi-line strings with delimiters like`
and"""
that we could specifically avoid.{{braces}}
and|bars|
?'
as an escape character, and to support multiple different "apostrophe modes"? Is it that the needs for escaping should be minimised, or that the rules and practices of escaping should be regularised? With MF2 we've clearly aimed for the former (e.g. limiting which characters may be\
escaped in pattern text vs. literal text), but is that really the only lesson to take here? Could we also consider choosing surprising syntax to be a source of potential errors that we ought to avoid?Finally, to illustrate what this is all about, consider this MF2 message, using our current syntax:
If we were to allow for more normal pattern and literal delimiters, this same message could read as:
While I appreciate that the alternative syntax would carry some costs, I believe that its benefits in readability and lack of weirdness outweigh the negatives. Therefore, I ask that we be open to discussing these choices further during the tech review phase.