Closed bhaible closed 1 month ago
You're right, although note that the trailing expression does not use any whitespace it captures. Presumably the reserved body should capture all trailing whitespace, in case it wants it for something.
The production for reserved-body
is meant to allow an arbitrary blob of tokens, with minimal constraint on the structure, to appear before at least one expression. The optional s
production at the start allows spaces to be "stirred in".
In general our syntax treats whitespace as exterior to the meaningful portions. Required whitespace exists to keep tokens apart (for example, between keys in a variant). Optional whitespace can be removed--except, probably, in reserved-body
, where it might be meaningful (or necessary at the start, in some cases).
reserved-body
should be set up not to have meaningful trailing whitespace.
It's tempting to say that a reserved keyword must be followed by space, but statements like .keyword(body){expression}
or .keyword{expression}
should be possible. At the same time, we only have to look at .local
to see a required space.
In general our syntax treats whitespace as exterior to the meaningful portions. Required whitespace exists to keep tokens apart ...
OK, so if I understand it correctly, the ambiguity resolution, in the example above, would be:
' ' parsed as s
'/foo/' parsed as reserved-body
'\u3000\u3000' parsed as [s]
Did I understand you correctly?
It's tempting to say that a reserved keyword must be followed by space
This would be hard to understand for users. The mental model users generally have is "spaces are needed to separate tokens which would otherwise combine to a single token".
At the same time, we only have to look at .local to see a required space.
Yes, but that is because .local
ends with an alphabetic character and the next token, a variable
, starts with a $
which we consider to act like an alphabetic character. Without the space, .local$foo
would be confusing to many users.
We removed reserved statement, so closing as out-of-scope
The rule for
reserved-statement
in https://github.com/unicode-org/message-format-wg/blob/main/spec/syntax.md and https://github.com/unicode-org/message-format-wg/blob/main/spec/message.abnfcontains two ambiguities: 1) If there is more than one whitespace character after the
reserved-keyword
, it is ambiguous how many of these whitespace characters are part of thes
rule, and how many of them are at the start of thereserved-body
. 2) U+3000 characters before theexpression
can be parsed at the end of thereserved-body
or as part of thes
rule.Example (using \u escapes for legibility): The input string
contains a
reserved-statement
for.regex /foo/\u3000\u3000{xyz}
and acomplex-body
for{{hello}}
. Inside thisreserved-statement
, there are 3 * 3 = 9 possibilities:It appears that the contents of the
reserved-body
is meant to appear as thebody
field of anUnsupportedStatement
element in the data model (cf. https://github.com/unicode-org/message-format-wg/blob/main/spec/data-model/README.md ). Therefore it matters which of these 9 possibilities the parser chooses.Please, specify how these two ambiguities should be resolved.