Closed eemeli closed 6 months ago
Why do you assume that the .match
has no declarational effect? The opposite is certainly true (declarations affect the selector). I see no reason why the shorthand shouldn't work. Heck, we show this working on the front door of our repo and multiple times in registry.md
:
.match {$count :integer}
0 {{You have no notifications.}}
one {{You have {$count} notification.}}
* {{You have {$count} notifications.}}
However, that doesn't format the $count explicitly as a number; we just presume it does, because "count" sounds numeric.
No. We presume it does because the .match
expression annotates $count
and there are no other annotations to override that annotation.
The place that would be confusing would be a message like this:
.match {$count :integer} {$count :number minimumFractionDigits=1}
* * {{What does {$count} print?}}
The addition of numeric replacements at this point strikes me as superfluous and massively confusing (given MF1's use of positional arguments, about which much ink was spilled).
An alternative solution would be: just as the expression that appears as a selector in a .match
has to be annotated, so does an expression used in a pattern.
If we changed that, then your second example:
.match {$count :number}
one {{You have {$count} apple}}
* {{You have {$count} apples}}
would be an error ("Missing Formatter Annotation" or something like that).
I realize this has probably been considered before. However, I think it's preferable to introducing positional arguments (even in a limited way).
Alternately, I can imagine this being addressed by a separate linter.
Why do you assume that the
.match
has no declarational effect?
Because we never say that it has any such effect. I could imagine such an approach providing a workable solution for the core problem, though.
According to our current language, selector and placeholder expressions do not modify their operand beyond themselves. If we want to make selectors work differently, we'll need to add spec language to that effect.
Heck, we show this working on the front door of our repo and multiple times in
registry.md
:
Actually, atm all those examples only look like they work. Which is a problem.
@eemeli noted:
According to our current language, selector and placeholder expressions do not modify their operand beyond themselves.
I can't find where we do that? The closest I can find is this statement in formatting.md
:
In selectors, the resolved value of an expression is used for pattern selection.
It doesn't say that it isn't later used for formatting. Obviously, I've already written an example of how this could be confusing (multiple selectors on the same variable) up above. But I think this is an important shorthand feature to provide.
@catamorphism noted:
An alternative solution would be: just as the expression that appears as a selector in a .match has to be annotated, so does an expression used in a pattern.
We could do that, although we also permit non-annotation there and allow implementations to supply the annotation. I think that's an important feature. We need to think of message authors here, I think.
@eemeli noted:
According to our current language, selector and placeholder expressions do not modify their operand beyond themselves.
I can't find where we do that?
It's in Variable Resolution: https://github.com/unicode-org/message-format-wg/blob/7a10ee259af0d9fd416808e265f95874009b98b7/spec/formatting.md?plain=1#L187-L191
The .match
isn't a declaration, and so nothing happening in its selector expressions affects the resolution of variables in placeholders.
The .match isn't a declaration, and so nothing happening in its selector expressions affects the resolution of variables in placeholders.
I don't buy that argument. The text you quote says how to determine what the operand's value is (it was either passed in or declared). The spec just does not say, one way or the other, whether the selector has any side effects on later placeholders. There are certainly implications (in both directions).
Ultimately, this speaks to #645 being something we need to work on.
Let's say you're right, and our current text does allow for a .match
expression to modify its operand so that information is carried from selector expressions to placeholders. So with something like
.match {$x :number minimumFractionDigits=2}
* {{x is {$x}}}
you would have an expectation of a formatting call with { x: 42 }
as input to output 'x is 42.00'
, yes?
In order for that to happen, the resolved value of the selector expression needs to be put somewhere so that resolving and formatting the placeholder picks up on it. Recalling the part that I quoted above, and again noting that .match
is not a declaration, the only available place is the formatting context input mapping. We should also note here that the only thing that can put the value there is the :number
function implementation, because the runtime behaviour is otherwise explicitly defined, and includes no such operation.
Further, considering that a message like
x is {$x :number minimumFractionDigits=2} or {$x}
really should format as 'x is 42.00 or 42'
rather than 'x is 42.00 or 42.00'
, i.e. the annotation on one placeholder should not affect another. This means that we have a further requirement for the process that's placing the selector expression's resolved value in the formatting context input mapping to know that it's a selector expression, rather than a placeholder.
So to make all that happen, custom function implementations need to be called with:
I don't think this is reasonable, or "simple and minimal" as we require in https://github.com/unicode-org/message-format-wg/blob/7a10ee259af0d9fd416808e265f95874009b98b7/spec/formatting.md?plain=1#L294-L296
Or is there some other mechanism that I'm missing that our spec enables for allowing this behaviour?
@eemeli asked:
you would have an expectation of a formatting call with { x: 42 } as input to output 'x is 42.00', yes?
Yes. Looking at that message, wouldn't that be your expectation? After all, we just did the selection on that format. We previously argued that we would require annotation in order for selection to happen (vs. allowing inference in placeholders).
Further, considering that a message like
I am not sure about your example here. The second $x
is not annotated and we haven't really said what happens in that instance. FWIW, I agree with what you're saying: we don't want side-effects from the first placeholder on the second. I am not arguing that you're wrong. What I'm reacting to is that we don't have a clearly stated prohibition on selectors working like declarations (but we can easily write examples where making them work like that is a Problem).
I still maintain that its easier on message writers (our largest audience) if there is a clearly defined behavior that allows them to write minimal messages in which the annotation on .match
affects contained placeholders.
So to make all that happen, custom function implementations need to be called with:
- write access to the formatting context input mapping,
- information about their expression's syntax position as a declaration, selector, or placeholder, and
- information about other selector expressions, in case they use the same operand.
I think function implementations need to be able to annotate the formatting context (otherwise .input
declarations don't do anything).
I am not sure the syntax position is necessary, although the data model does communicate this. The function should only need to know what its inputs (operands and options) are. It's the MF processor that knows about the function's position. Modifying your example:
.local $x = {1.23 :number maximumFractionDigits=2}
.match {$x :integer}
* {{x is {$x :number minimumFractionDigits=2} or {$x}}}
The .local
is stored to the formatting context as the value of x
and this includes the maximum fraction option. The selector might be stored there, overriding x
or it might not. If it were stored there, it would wipe out the .local
annotation, which would seem to violate immutability. The placeholders are not stored to the formatting context. Since MF does the calling of the function, it decides what to do with the output. That allows functions to be ignorant about other expressions. The :number
and unannotated placeholders would, though, have access to the formatting context annotated value of $x
so that they might recover the maximumFractionDigits
and :number
(or :integer
) annotation.
Anyway, I think, overall, you have identified a problem with the spec. We really need to work through #645.
I still maintain that its easier on message writers (our largest audience) if there is a clearly defined behavior that allows them to write minimal messages in which the annotation on
.match
affects contained placeholders.
I agree, something like this ought to be possible. The core issue here is that this isn't supported by our spec atm. My first thought was to enable this via $0
etc, but we could maybe also make .match
work as a declaration. It certainly works for simple cases, but there's a danger of it falling apart when multiple expressions need to work with the same operand.
For example, consider this message, which presupposes custom :gender
and :name
functions that work on a complex $person
value:
.match {$person :gender}
male {{He ({$person :name}) said:}}
female {{She ({$person :name}) said:}}
* {{They ({$person :name}) said:}}
If the resolved value of the selector is assigned to $person
, then the above might not work, and would instead require something like:
.local $name = {$person :name}
.match {$person :gender}
male {{He ({$name}) said:}}
female {{She ({$name}) said:}}
* {{They ({$name}) said:}}
but that's potentially breaking our current invariant of having only one meaning for $person
, as it's used in the .local
before being assigned a value in the .match
. Would we require something like this instead?
.local $name = {$person :name}
.local $gender = {$person :gender}
.match {$gender}
male {{He ({$name}) said:}}
female {{She ({$name}) said:}}
* {{They ({$name}) said:}}
Or would we make references to $person
in placeholders illegal if it's used in declarations and selectors?
My point here being, assigning declaratory powers to .match
is tricky, especially if they appear to modify the operand.
I think function implementations need to be able to annotate the formatting context (otherwise
.input
declarations don't do anything). [...] The.local
is stored to the formatting context as the value ofx
and this includes the maximum fraction option.
Not quite. Declarations are handled by the Variable Resolution bit that I quoted earlier: https://github.com/unicode-org/message-format-wg/blob/7a10ee259af0d9fd416808e265f95874009b98b7/spec/formatting.md?plain=1#L187-L191
Effectively, declarations together provide a value mapping that we check first, before looking in the formatting context's input mapping. This is also a key part of what allows for lazy evaluation, as the above quote is the only path through which we look at any of the declarations during Pattern Selection or Formatting.
If we do want .match
to have an effect on placeholder variables, the declaration mapping is what we'll want to be changing.
Anyway, I think, overall, you have identified a problem with the spec. We really need to work through #645.
I'm pretty sure that this is a separate issue from what #645 is looking to resolve. In large part, that's about refactoring the definition of "resolved value"; this is about .match
having side effects.
I agree that this is not directly related to #645.
Also consider:
.match {$count :plusOne}
one {{You have {$count} apple and one more}}
* {{You have {$count} apples and one more}}
where plusOne
is a custom function that takes an integer n and returns n + 1. If the meaning of {$count}
in the patterns is actually the meaning of {$count0 :plusOne}
(introducing the name $count0
for the value bound to $count
in the scope of the selector expressions), I think that's quite confusing.
This is perhaps a contrived example, but the point is that custom functions can return anything, and always treating {$x :f}
like $x
breaks intuition about how functions work, especially if that's context-dependent:
.input {$count}
.local $count1 = {$count :plusOne}
.local $count2 = {$count1 :plusOne}
.match {$count1}
one {{You have {$count} apple}}
* {{You have {$count} apples}}
If $count1
has the same value as $count
in the patterns, but not in the right-hand side of the declaration of $count2
, that seems quite surprising.
This shows how treating match
as if it introduces a new scope and is a third form of declaration breaks the substitution principle (that you can understand the meaning of a message by substituting the right-hand side of a binding for the left-hand side).
Another alternative to Eemeli's solution to this problem would be to change the syntax to introduce names along with the keys, where each variant binds a name to each of the selector expressions:
.match {$count :plusOne}
$one = one {{You have {$one} apple and one more}}
$many = * {{You have {$many} apples and one more}}
This is similar to how case
expressions work in Haskell, and introduces new names explicitly. The difference between this and Eemeli's suggestion to add positional variables is superficial, but I think named variables are less confusing.
In general the idea of formatting a variable without a function specified was that the formatting function is determined by the type of the variable.
This is something that was strongly argued from very early on by Zibi and I think Stas, and also by ICU people. Because it is intuitive, and Fluent / ICU / other systems to this.
But this example is really confusing, and I don't think we should inherit the selector for formatting.
Because these are different things.
As I argued many times, the selection / formatting functions implement different interfaces.
Do something different, take different arguments, return different results.
And one of the main reasons I argued that :number
is a bad name for selection.
The :number
selector is not a formatter, so it can't be inherited.
It is just confusion caused by the fact that we use a common name.
Think about a list:
.match {$theList :isEmpty}
true {{You bought nothing}}
* {{You bought {$theList}!}}
We don't expect {$theList}
in the *
variant to result in some kind of boolean, we expect a list formatter.
TLDR: so no, I don't think it should be inherited.
As someone still catching up on the new syntax/spec, but working daily in MF1 with many devs and their misconceptions about it, I can see selectors mutating later placeholder usage as causing far more confusion than convenience. I think most efforts to make messages shorter are going to cause long-term confusion in usage. That is, of course, separate from reducing verbosity (e.g. dropping when
).
$0
introduces ambiguity for translators (and, frankly, most devs).
Is it possible to give selectors some additional shorthand syntax for declarations?
.match {$theList :cost} as $listCost
.match {$theList :cost} > $listCost
.match {$theList :cost > $listCost}
That would give devs flexibility/convenience in the "setup" without introducing a sort of magical, and fragile, concept for patterns.
Otherwise, I'd be perfectly happy with the explicit version:
.local $listCost = {$theList :cost}
.match {$listCost}
0 {{You spent nothing}}
* {{You spent {$listCost}!}}
One way of expressing the root issue here is that currently, the simplest way of expressing a message does not match with what ought to be used for the right formatting. So could we change that simplest message expression to match the right results, either by changing how we process a .match
selector, or by introducing some new syntax?
The explicit assignment within .match
suggested by @bearfriend above could work, but that got me thinking that we could achieve the same result by simplifying the syntax instead:
.input {$count :number}
.match $count
one {{You have {$count} apple}}
* {{You have {$count} apples}}
In other words, if .match
only allowed variables and not expressions, an .input
or .local
would be required for the selector, making it easy to reuse when appropriate, while preserving access to the original value:
.local $empty = {$theList :isEmpty}
.match $empty
true {{You bought nothing}}
* {{You bought {$theList}!}}
In other words, if .match only allowed variables and not expressions, an .input or .local would be required for the selector
I think I do prefer that. It's not as short, but again, I don't think that's the most important factor. It's very clear, and reducing multiple paths to the same result wherever reasonable and avoiding "magical" conveniences will have a big impact on getting valid messages in and keeping them valid through the whole process.
Don't hesitate to shut me down if this has already been hashed out, but I'm also now wondering if the root of the confusion stems from expressions being sometimes mutating (inputs) and other times not (patterns). Does it make sense for inputs to either look more like they're just coercive redeclarations?
.input $count = {$count :number}
or perhaps avoid expressions altogether and have a sort of basic casting syntax:
.input $count:number
Where anything more complex would require a new .local
.
.input $count:number
.local $negative = {$count :isNegative canOwe=$canOwe}
.match $negtive $count
true one {{You owe {$count} apple}}
true * {{You owe {$count} apples}}
false one {{You have {$count} apple}}
false * {{You have {$count} apples}}
Don't hesitate to shut me down if this has already been hashed out, but I'm also now wondering if the root of the confusion stems from expressions being sometimes mutating (inputs) and other times not (patterns).
The current declaration syntax is indeed the result of an extended "hashing out", so it might be best not to reopen that unless there's some very clear reason (see this design doc for some of the history). While it's related to the current issue (much like the "resolved value" discussion of #645), I don't see it providing a resolution to what's happening with the .match
selectors.
expressing a message does not match with what ought to be used for the right formatting. So could we change that simplest message expression to match the right results,
This only looks like "the right X" because we decided that :number
is something to select on, and also format with.
That is something that is more of an exception than a rule, an accident.
And the reasons I opposed it at the time: makes people thing that some accidental name coincidence has some implications.
We don't decide AND format on dates, times, list formats, gender, etc.
If we think of it this way:
.match {$count :plural}
one {{You have {$count :number} apple}}
* {{You have {$count :number} apples}}
the "dissonance" goes away.
Because :plural
is the real decision.
If I ask someone "what is number(3)", without any context, what is the intuitive answer?
Vs "what is plural(3)", without any context.
The decision on a number is plural
, or even
or odd
or prime
or bigger than X
. These are not formatting functions.
Note: this is a discussion we had and arguments I made before (that we are confusing concepts, and will confuse people). We voted, and we decided otherwise, as a group. I don't try to reopen the issue, or say "told you so". It is only intended as clarification for @bearfriend :-)
If we think of it this way:
.match {$count :plural} one {{You have {$count :number} apple}} * {{You have {$count :number} apples}}
the "dissonance" goes away.
Using a different function for selection does not change the way that leaving out the formatters looks like it's right:
.match {$count :plural}
one {{You have {$count} apple}}
* {{You have {$count} apples}}
Here, if $count
is a numeric string like '1234'
or '1.0'
, then the numerical category selection will work as expected, but the formatting of the selected pattern won't. This is worse than an unannotated placeholder in a simple pattern, because we can end up with an incorrect "You have 1.0 apple" result.
This is worse than an unannotated placeholder in a simple pattern, because we can end up with an incorrect "You have 1.0 apple" result.
What are you considering worse about this? Because some might have the impression that the selection annotation would do the work for them?
Edit: I understand why "1.0 apples" is bad, but it's the same result as the unannotated simple pattern, no?
(as chair)
Discussion of number selection using the same function as the formatter is a WG consensus and is documented in Selection on Numerical Values in exploration. Discussion of this is closed in the LDML45 timeframe.
We welcome feedback on user's lived experience with this as part of the tech preview.
(as contributor)
@eemeli's original issue is whether/how a .match
selector affects formatting (the inverse of how a declaration can affect the selector). I think this is an interesting question. I thought I knew the answer to it and will be interested to test the various implementations, user's expectations, my own code, etc.
Let's compare this to programming languages:
We don't expect that the function in switch
(number
) affects the formatting done by number
in the case:
switch ( number(foo, precision=2) ) {
case ... : ..... number(foo)...
}
We expect and it is natural that bar
is the same thing in both selection and the case:
let bar = number(foo, precision=2)
switch ( {bar} ) {
case ... : ..... {bar}...
}
I didn't post this earlier but it seems perhaps more relevant now:
you would have an expectation of a formatting call with { x: 42 } as input to output 'x is 42.00', yes?
Yes. Looking at that message, wouldn't that be your expectation?
This would not be my expectation at all. Obviously, this is subjective but I have thus far come to see functions as simply casting in place, and the different contexts then have different output effects.
The way I see this:
.input {$myNum :number}
.local $myStr = {$myNum :string}
.match {$myNum :boolean}
* {{{$myNum :plusOne} != {$myStr :plusOne}}}
in javascript, is roughly:
function format({ myNum: _myNum}) {
const myNum = Number(_myNum ?? 0);
const myStr = String(myNum);
switch(Boolean(myNum)) {
default:
return `${plusOne(myNum)} is not ${plusOne(myStr)}`; // format({ myNum: 15 }) -> "16 is not 151"
}
}
I think function implementations need to be able to annotate the formatting context (otherwise
.input
declarations don't do anything).
I would like to push back hard on this... I consider it critical that function implementations not be able to modify anything in the context, and really to not have any observable consequences beyond their output. MessageFormat behavior should hew as closely as possible to https://www.rfc-editor.org/rfc/rfc9535.html#section-2.4 :
A function extension MUST be defined such that its evaluation is free of side effects, i.e., all possible orders of evaluation and choices of short-circuiting or full evaluation of an expression containing it MUST lead to the same result. (Note: Memoization or logging are not side effects in this sense as they are visible at the implementation level only -- they do not influence the result of the evaluation.)
I would like to push back hard on this... I consider it critical that function implementations not be able to modify anything in the context, and really to not have any observable consequences beyond their output.
I agree wholeheartedly with what you're saying. What I meant was different: while doing formatting, functions can access the contents of their copy of the formatting context. They should have no observable impact outside of the call to "format message". But it should be possible to write functions in messages that do useful stuff:
.input {$someNumber :add amount=2}
{{The output of {$someNumber} should be 2 more than the value that was input}}
The value of the variable someNumber
passed into MessageFormat will in no way be affected by the message or the function. But the "resolved value" of $someNumber
inside the message is annotated to :add
2
, no?
If I use the above message with pseudocode like:
var foo = 7.2;
var args = Map.of('someNumber', foo);
console.log(MessageFormat.formatMessage(aboveMessageString, args));
The value of foo
will still be 7.2
(and the value of someNumber
in args.get('someNumber')
will too), but the message should print out 9.2
for the value (assuming :add
does what one expects).
I agree wholeheartedly with what you're saying. What I meant was different: while doing formatting, functions can access the contents of their copy of the formatting context. They should have no observable impact outside of the call to "format message". But it should be possible to write functions in messages that do useful stuff:
.input {$someNumber :add amount=2} {{The output of {$someNumber} should be 2 more than the value that was input}}
The value of the variable
someNumber
passed into MessageFormat will in no way be affected by the message or the function. But the "resolved value" of$someNumber
inside the message is annotated to:add
2
, no?
Yes, the resolved value of $someNumber
will be affected by :add
—but as a result of behavior associated with .input
rather than the function implementation. For example, :add
must not be able to affect $x
in this message:
.input {$x}
.input {$someNumber :add amount=2}
{{The value of {$x} matches what was provided, but {$someNumber} is function output.}}
If I use the above message with pseudocode like:
var foo = 7.2; var args = Map.of('someNumber', foo); console.log(MessageFormat.formatMessage(aboveMessageString, args));
The value of
foo
will still be7.2
(and the value ofsomeNumber
inargs.get('someNumber')
will too), but the message should print out9.2
for the value (assuming:add
does what one expects).
I'm making a much stronger assertion that function evaluation must not directly modify the formatting context at all. Instead, it can only be the MessageFormat machinery itself that associates function output with variables.
@gibson042 Ironically, I think you and I are in violent agreement. The problem here is that our mental model of what the "formatting context" is differs. I think it is the set of resolved values visible only to formatting functions/selectors within the context of a specific message. The calling context for MessageFormat is something else entirely. And I agree that functions do not have write access to the formatting context (.input
, .local
, and possibly .match
do, but are part of MF)
The problem here is that our mental model of what the "formatting context" is differs. I think it is the set of resolved values visible only to formatting functions/selectors within the context of a specific message.
According to the LDML45 spec, the only such value revealed to function implementations is the current locale. In the JS Intl.MessageFormat proposal spec, the localeMatcher and the expression source are also included.
I think those are missing the current bidi setting, and it's conceivable for other constructor options to also get passed along, but are there use cases for any other values?
According to the LDML45 spec, the only such value revealed to function implementations is the current locale. In the JS Intl.MessageFormat proposal spec, the localeMatcher and the expression source are also included.
You're overlooking the actual section on formatting context, which lists five things:
Note that the formatting context, when defined this way, doesn't really describe a data structure so much as it is suggestive of a set of APIs inside MF.
Also reader/listener's gender.
On Thu, Apr 25, 2024, 06:19 Addison Phillips @.***> wrote:
According to the LDML45 spec https://unicode.org/reports/tr35/tr35-messageFormat.html#function-resolution, the only such value revealed to function implementations is the current locale. In the JS Intl.MessageFormat proposal spec https://tc39.es/proposal-intl-messageformat/#sec-resolvefunction, the localeMatcher and the expression source are also included.
You're overlooking the actual section on formatting context https://www.unicode.org/reports/tr35/tr35-messageFormat.html#formatting-context, which lists five things:
- locale
- base direction
- input mapping
- function registry/registries
- optional fallback string
Note that the formatting context, when defined this way, doesn't really describe a data structure so much as it is suggestive of a set of APIs inside MF.
— Reply to this email directly, view it on GitHub https://github.com/unicode-org/message-format-wg/issues/736#issuecomment-2077162764, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJLEMCKBUPJAQ33DUBCE3LY7D7FDAVCNFSM6AAAAABE5N7YXCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZXGE3DENZWGQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
You're overlooking the actual section on formatting context, which lists five things:
I'm not, though. Note this important part of Function Resolution:
Function access to the formatting context MUST be minimal and read-only, and execution time SHOULD be limited.
My understanding of "MUST be minimal" is that every part that is made available must be explicitly rationalised. In particular, as values from the input mapping and the function registry are made available via variables and declarations, those should never be available internally to a function implementation.
Admittedly the spec language does currently leave the interpretation of "minimal" up to an implementation, so in theory anything's possible.
While preparing yet another presentation on MF2, I needed to write a simple example that got me thinking:
It's not necessarily obvious that the above is what we expect, when this looks like it'll work just as well:
However, that doesn't format the
$count
explicitly as a number; we just presume it does, because "count" sounds numeric.So the thought I had here is that this is pretty much exactly why & how MF1 ended up with
#
being special in plural selectors, and that the solution we're providing is much less obvious and requires writing a whole new.input
statement.Could we consider making the
.match
expressions also act as implicit declarations, and make them usable in placeholders? The somewhat obvious way to address them is by index position:Assigning values to
$0
,$1
, ... would not conflict with any input values, as numbers are invalidname-start
characters. That's by design so that we encourage at least some name for each variable; here that's effectively provided by the.match
expressions.I suspect that adding this shorthand would provide a more ergonomic solution for most
.input
use cases, and would enable the representation of many messages without any declarations, which currently would require one to avoid significant repetition.The syntax change required by this would probably look something like this:
with accompanying spec language making numeric variables resolve to the
.match
selectors in placeholders, and a data model error otherwise.