unicode-org / message-format-wg

Developing a standard for localizable message strings
Other
212 stars 32 forks source link

Selection method expectations #425

Closed eemeli closed 7 months ago

eemeli commented 1 year ago

In the following, the (a), (b), etc. may be used to refer to individual messages.

For formatting, I believe that it's relatively uncontroversial to expect that with an en-US message (a)

{Foo is {$foo}}

formatting it to a string with { foo: 'bar' } would result in Foo is bar while formatting it to a string with { foo: 1234 } would result in Foo is 1,234 rather than Foo is 1234.

In other words, we would expect a numerical value to be formatted according to the current locale, even if the type of the value is not made explicit in the message.

Separately, I also believe that it's almost as uncontroversial to expect that with an en-US message (b)

match {$foo}
when bar {Foo is bar}
when * {Foo is other}

formatting it to a string with { foo: 'bar' } would result in Foo is bar.

However, going further from this I believe we encounter less agreement. Specifically, with an en-US message (c)

match {$foo}
when one {{$foo} foo}
when * {{$foo} foos}

should formatting it to a string with { foo: 1 } result in 1 foo or 1 foos?

In fact, we may disagree on the results even with an en-US message (d)

let $bar = {$foo :number}
match {$bar}
when one {{$bar} foo}
when * {{$bar} foos}

when formatting it to a string with { foo: 1 }; should that result in 1 foo or 1 foos?

On a related note, we probably do want to define what happens when an en-US message (e)

match {$foo :number}
when |1,234| {Localized match}
when 1234 {Numeric match}
when * {No match}

is formatted with { foo: 1234 }, because there an argument could be made for any variant.

I think we should make explicit determinations

  1. whether to assign "plural" or "select" treatment to numbers, and
  2. whether to apply the same sort of built-in type casting during selection as formatting.

As I see it, this is in part a discussion on how do we want to conceptually present our selection or matching expressions. In MF1, one of the keywords select, plural, selectordinal is always required. In MF2, we only use match. Therefore we've effectively left it open so far whether with regard to selection we want to use methods that imply that we are performing e.g. "plural selection" or "select selection" as in MF1, or if we're performing "selection on a number" or "selection on a string".

In #420 we are adding at least the number and datetime formatting functions to our base registry. I think at least one of the following additional functions is needed to deal with selection:

  1. plural
  2. select
  3. string

With just plural, we would in general default to "select selection" and require its explicit use for plural category consideration. The (e) message would still require some special consideration of numbers.

With just select, we would default to "plural selection" for numbers and "select selection" otherwise. Using select on a number would explicitly opt out of its plural category consideration, and would need to define what happens in (e).

With string, the behaviour would in practice be the same as with select, but the conceptual sense of what's happening with expressions used for matching is different: the input value is in a sense cast to the given type, which comes with given selection behaviour.

The question of whether to require some explicit function on each selector (i.e. the difference between messages (c) and (d)) may be considered separately. For example, one of plural/select or number/string could always be required.

eemeli commented 1 year ago

Some parts of this have been surfaced previously (e.g. https://github.com/unicode-org/message-format-wg/pull/368#issuecomment-1554762223, https://github.com/unicode-org/message-format-wg/pull/420#discussion_r1263480810), but I don't believe that we've established consensus on this topic. It also has not been necessary for us to address before now, as we've been thus far working without any prepopulated registry.

I think this topic is a blocker for including any matching function implementations in our base registry.

My own preference would be for the string approach, as that would be internally consistent with the way we're approaching formatting functions. It would also make it easier to consider later or custom extensions to e.g. the behaviour of datetime, which could allow selection on past, today, future, or other keywords.

I also believe that allowing the exact same expression to be used for both selection and formatting would make it likelier for developers to use a local variable for it, as in:

let $count = {$books :number maximumFractionDigits=0}
match {$count}
when one {You have {$count} book}
when * {You have {$count} books}
aphillips commented 1 year ago

These are good thoughts. I think they might need to be broken into separate concerns.

  1. What is the defaulting behavior of an unspecified selector? Of a formatter? (Should we even define this?)

Much of your commentary is about choosing the default behavior of an unspecified selector, e.g. match {$foo}. In ICU4J, this is done via reflection--which is reasonable in a strongly typed language like Java. In our syntax, since we don't have types, this becomes complicated. 2023-07-15 is a string and might be a date. 2023 is a string and might be a number and an integer and might be a year...

What you call string is what I call select.

We might say that non-default default selection is up to the implementation. That is, ICU4J's implementation might use reflection to decide that if (arg instanceof Date) (or Temporal or Calendar) then match {$arg} is using a temporal selector of some sort... while JS might say it's just a select. This might result in encouraging folks to write explicit selectors (match {$foo :plural}) even when such were not required?

  1. Are formatters and selectors distinct? If they are no distinct, when do we lump or split functionality? If they are distinct, can they share keywords (e.g. number-selector and number-formatter)? (We will need to deal with naming precedence rules in any event)

On the one hand, I think there is a certain elegance to what you suggest above. Maybe the veritable zoo of time formatters (regular, relative, duration, etc.) or number (regular, currency, measure, percent, ordinal, etc.) is a quirk we can avoid via the judicious use of options.

On the other hand, there are multiple different types of selection that can be applied to e.g. a number or e.g. a time value, just as there are multiple types of formatting that can be applied.

There is also the problem of identifying what selector or selector behavior is intended when the argument is not available. For example, in your example above, match {$count} is secretly a plural selector. It is opaque to translation tools whether when one was matching the literal |one| or was the plural keyword (and thus the tool needs to generate few and many slots from the source language in order to service the pl-PL locale, for example). The declaration helps, but

For these reasons, I tend to think that separate functions for different types of selection might win out against cramming everything into single keywords.

  1. What are the requirements and best practices for inclusion in the default registry?

We will describe the registry in the spec in functional terms. Individuals or platform implementations might then use that to define valid functions that we collectively might disapprove of. But our default registry should meet a high bar and should model the right behaviors. How do we decide? What model should we use? We need to document these, both to guide future registration requests, and to ensure that we understand it ourselves.

mihnita commented 12 months ago

I am sure this is something we covered before: https://github.com/unicode-org/message-format-wg/blob/main/meetings/2022/notes-2022-08-22.md

However, going further from this I believe we encounter less agreement. Specifically, with an en-US message (c) ... whould formatting it to a string with { foo: 1 } result in 1 foo or 1 foos?

Absolute disagreement :-)

So:

match {$bar}
   when one {{$bar} foo}
   when * {{$bar} foos}

would return "1 foo" but:

match {$bar}
   when one {{$bar} foo}
   when pink {{$bar} foox}
   when * {{$bar} foos}

would return "bar foos"?

So now to figure out what will happen at runtime (or how to translate) one must know the type of the argument AND look at ALL the keys listed in the message?

This is one of the reasons I proposed that selectors ALWAYS have a function. Might be "indirect" (from a local variable). But it has to be there to avoid confusion.

This becomes every messier in languages like JavaScript, or Python, where types can easily change "by mistake". So you think something is a number, but somehow it became something else.

mihnita commented 12 months ago

Are formatters and selectors distinct? If they are no distinct, when do we lump or split functionality? If they are distinct, can they share keywords (e.g. number-selector and number-formatter)? (We will need to deal with naming precedence rules in any event)

I do believe (and argued before) that they are distinct.

Intl.PluralRules and Intl.DateTimeFormat are not the same kind of classes.

And a collator doing numeric-aware sorting ("3 files" sorting before "21 files") should understand a bit about parsing numbers (including non-ASCII digits), and maybe understand a bit about breaking at word limits, for example. That does not make collators and formatters / breakiterators the same thing.

If we think in terms of interfaces, the signatures are different. They take different arguments, and do different things.

Looks pretty different.

Same as in programming, a class might implement two interfaces, if it chooses to do so.

And can also register with the same name, or different (pseudocode, even if it looks like Java):

class NumberFormatAndPlural implements Selector, Formatter { ... }
registry.addSelector("plural", NumberFormatAndPlural.class);
registry.addFormatter("number", NumberFormatAndPlural.class);

Some might have a different class for PluralSelector implementing Selector, which might delegate some functionality to a formatter, or not. Or might have a small utility class used by both the plural selector and number formatter with the common code.

By making selectors and formatters separate we allow implementations to do what they think it works best for them, internally, without forcing those decisions on others, or exposing those decisions.

Note that in order to make the selection on 1 formatted as "1" ("1 dollar") vs "1.00" (1.00 dollars) we don't actually need to format. It is enough to look at the options (min/max fractional digits, etc). We don't need all the options formatting needs, we don't need to know what digits or thousand separators to use, etc., and we don't need to format at all.

It only needs to understand a small subset of what the formatter does. It does not format, and does not need a formatter to understand how to select.

So plural selection is not as tied to a number formatter as it might seem.

macchiati commented 12 months ago

The problem we are trying to solve is that if you format numbers (and other cases with numeric components, like amounts with units) with different options than you select, you will get the wrong answer.

If we had multivalue return types, then you could have it all be crystal clear:

let $countFormat, $countSelector = {$inputCount :number ... options} match {$countSelector} when ... {...{$countFormat}....}

or if you had them separated you could do

let $countFormat = {$inputCount :numberFormat ... options} let $countSelector = {$inputCount :numberSelect ... options2} // and have likely failure if options2 ≠ options match {$countSelector} when ... {...{$countFormat}....}

What we're trying to do is hide that complexity under the covers, with:

let $count = {$inputCount :number ... options} match {$count} when ... {...{$count}....}

where logically speaking, $count internally has two values, a formatted number, and a selector appropriate for that formatted number.

On Tue, Jul 18, 2023 at 10:20 AM Mihai Nita @.***> wrote:

Are formatters and selectors distinct? If they are no distinct, when do we lump or split functionality? If they are distinct, can they share keywords (e.g. number-selector and number-formatter)? (We will need to deal with naming precedence rules in any event)

I do believe (and argued before) that they are distinct.

If we think in terms of interfaces, the signatures are different. They take different arguments, and do different things.

  • formatting takes the operand + options and returns a "format-to-parts" thing, or a string
  • selectors takes the operand + options + existing matching keys and returns which key is best

Intl.PluralRules and Intl.DateTimeFormat are not the same kind of classes.

Same as in programming, a class might implement two interfaces, if it chooses to do so.

And can also register with the same name, or different (pseudocode, even if it looks like Java):

class NumberFormatAndPlural implements ISelector, IFormatter { ... } registry.addSelector("plural", NumberFormatAndPlural.class); registry.addFormatter("number", NumberFormatAndPlural.class);

Some might have a different class for PluralSelector implementing Selector, which might delegate some functionality to a formatter, or not.

For example a collator doing numberic-aware sorting ("3 files" sorting before "21 files") should understand a bit about parsing numbers (including non-ASCII digits), and maybe understand a bit about breaking (word limits, for example). That does not make collators and formatters / breakiterators the same thing.

By making selectors and formatters separate we allow implementations to do what they think it works best for them, internally, without forcing those decisions on others, or exposing those decisions.

Note that in order to make the selection on 1 formatted as "1" ("1 dollar") vs "1.00" (1.00 dollars) we don't actually need to format. It is enough to look at the options (min/max fractional digits, etc). We don't need all the options formatting needs, we don't need to know what digits or thousand separators to use, etc., and we don't need to format at all. So plural selection is not as tied to a number formatter as it might seem.

— Reply to this email directly, view it on GitHub https://github.com/unicode-org/message-format-wg/issues/425#issuecomment-1640648720, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJLEMGOK7YBAQOHIDHWPO3XQ3AVPANCNFSM6AAAAAA2LFG2O4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

eemeli commented 12 months ago

@aphillips: These are good thoughts. I think they might need to be broken into separate concerns.

I think I mostly agree with your questions as presenting nicely orthogonal facets of this topic. Some specific comments and replies below.

What you call string is what I call select.

Yes. For selection, they work the same. The key difference is in how we're asking people to conceive of the function. Is it a "selection function" or is it a type cast/assertion?

On the other hand, there are multiple different types of selection that can be applied to e.g. a number or e.g. a time value, just as there are multiple types of formatting that can be applied.

We ought to also take into account their frequency. For example, in my experience it is exceedingly rare for a selector on a number not to be a plural category selector. I do think that we need to ensure that non-plural number selection is possible, but I do not believe that it's a very common thing.

If anyone has examples or data on this that they could share, that would be invaluable.

There is also the problem of identifying what selector or selector behavior is intended when the argument is not available. For example, in your example above, match {$count} is secretly a plural selector. It is opaque to translation tools whether when one was matching the literal |one| or was the plural keyword (and thus the tool needs to generate few and many slots from the source language in order to service the pl-PL locale, for example). The declaration helps, but

I agree that this is a potential issue for a message like (c) where the type of the selector is not at all defined in the message. In the $count example, however, we know that it's a :number. Therefore, if we make it explicit and public that selection on a number is handled as plural selection, then that isn't a secret, and tools may generate variants for a target locale as appropriate.

Or did you intend to address that in the last sentence? It cuts off a bit abruptly.

We will describe the registry in the spec in functional terms. Individuals or platform implementations might then use that to define valid functions that we collectively might disapprove of. But our default registry should meet a high bar and should model the right behaviors. How do we decide? What model should we use? We need to document these, both to guide future registration requests, and to ensure that we understand it ourselves.

Strong agreement here; it's why I raised this issue explicitly.

@mihnita: I am sure this is something we covered before: https://github.com/unicode-org/message-format-wg/blob/main/meetings/2022/notes-2022-08-22.md

Yes, we've discussed parts of this topic before, but have not concluded those discussions.

So:

match {$bar}
   when one {{$bar} foo}
   when * {{$bar} foos}

would return "1 foo" but:

match {$bar}
   when one {{$bar} foo}
   when pink {{$bar} foox}
   when * {{$bar} foos}

would return "bar foos"?

So now to figure out what will happen at runtime (or how to translate) one must know the type of the argument AND look at ALL the keys listed in the message?

That would require introspection of the variant keys to determine how selection happens. I am not proposing for that to happen, and I don't think we should make that possible.

This is one of the reasons I proposed that selectors ALWAYS have a function. Might be "indirect" (from a local variable). But it has to be there to avoid confusion.

If so, should we also always require a function for placeholders to avoid such confusion?

By making selectors and formatters separate we allow implementations to do what they think it works best for them, internally, without forcing those decisions on others, or exposing those decisions.

I don't think I follow how using the same MF2 keyword imposes a restriction on implementations. Following your pseudocode example, why wouldn't this also work?

class NumberFormatAndPlural implements Selector, Formatter { ... }
registry.addSelector("number", SelectPlural.class);
registry.addFormatter("number", NumberFormat.class);

Note that in order to make the selection on 1 formatted as "1" ("1 dollar") vs "1.00" (1.00 dollars) we don't actually need to format. It is enough to look at the options (min/max fractional digits, etc). We don't need all the options formatting needs, we don't need to know what digits or thousand separators to use, etc., and we don't need to format at all.

I agree, the options bags are not exactly the same. Plural selection also depends on the type of selection we're doing (cardinal vs. ordinal) which a formatter won't need to care about. But they don't conflict either, and can be combined so that a formatter cares about some of them, while a selector cares about a different subset.

@macchiati: The problem we are trying to solve is that if you format numbers (and other cases with numeric components, like amounts with units) with different options than you select, you will get the wrong answer.

Yes. And going further, to establish a good precedent and model for future/custom selectors and formatters to follow so that they don't end up making a similar mistake.

What we're trying to do is hide that complexity under the covers, with:

let $count = {$inputCount :number ... options} match {$count} when ... {...{$count}....}

where logically speaking, $count internally has two values, a formatted number, and a selector appropriate for that formatted number.

You know, going a slight step further, we could simplify the syntax a little and make this mistake even harder to make:

let $count = {$inputCount :number maximumFractionDigits=0}
match $count
when one {You have {$count} book}
when * {You have {$count} books}

The change here is that match selectors aren't expressions, but variable references, so as with the let LHS they wouldn't need braces anymore. This would make it harder (but not impossible) to use a different set of options for selection and formatting, and guide developers towards better practices. It would, however, add one more required line for the relatively rare messages that do not use their selection value in the message body.

eemeli commented 12 months ago

@mihnita: This is one of the reasons I proposed that selectors ALWAYS have a function. Might be "indirect" (from a local variable). But it has to be there to avoid confusion.

I think I'm starting to agree with this. Revisiting my example messages from above, I now think that here:

[...] with an en-US message (b)

match {$foo}
when bar {Foo is bar}
when * {Foo is other}

formatting it to a string with { foo: 'bar' } would result in Foo is bar.

the message should be considered to contain a data model error, and to format as{�}. If we don't do that, then we're going to end up with this being a valid message which looks like it's doing plural selection:

match {$count}
when 1 {one}
when * {other}

For any translation tooling to work with that, it either needs an annotation, or we really do need a semantic comments spec that we could rely on defining the value type.


Regarding :select vs. :string, I have an additional observation to make, namely that this looks too valid:

match {$gender :select}
when male {He did a thing}
when female {She did a thing}
when * {They did a thing}

This is how gender selectors (either personal or grammatical) are currently done in MF1 and other formats that support such, and with the :select it looks fine. A developer writing this would not get a sense that they ought to write a custom function, or do anything more with this.

But it's not fine. We are providing no information here about what type of value we're selecting on, and whether there might be a locale dependency on the appropriate keys. Finnish, for instance, does not use either grammatical or personal gender.

If instead the message read as:

match {$gender :string}
when male {He did a thing}
when female {She did a thing}
when * {They did a thing}

then it's much more obvious how we're treating $gender, and that we're providing no information about it. It's much easier to see the explicit :string as a code smell that ought to be fixed.

aphillips commented 12 months ago

A few minor comments.

(emphasis added)

We ought to also take into account their frequency. For example, in my experience it is exceedingly rare for a selector on a number not to be a plural category selector. I do think that we need to ensure that non-plural number selection is possible, but I do not believe that it's a very common thing.

I think it is important to be careful of frequency arguments. Strings with replacements are "rare" compared to strings without. String with two selectors are rare compered to those with only one. Etc. I have plenty of examples of non-plural numeric selection that should be handled by the message formatters and not by writing spaghetti logic.

I should note that virtually all cases of inserting a number into a string should (also) involve a plural selector (for the grammatical matching).

The example I use for this is one you can observe in most driving direction applications:

match {$distanceRemaining :plural} {$distanceRemaining :lt value=10}
when =0  *     {You have arrived}
when one true  {You have {$distanceRemaining :number minFractionDigits=1} km to go.}
when one false {You have {$distanceRemaining :integer} km to go.}
when *   true  {You have {$distanceRemaining :number minFractionDigits=1} km to go.}
when *   false {You have {$distanceRemaining :integer} km to go.}

I agree that this is a potential issue for a message like (c) where the type of the selector is not at all defined in the message. In the $count example, however, we know that it's a :number. Therefore, if we make it explicit and public that selection on a number is handled as plural selection, then that isn't a secret, and tools may generate variants for a target locale as appropriate.

How does a translation tool which only has access to the message resource know that $count is a number? Just because one of the keys is one? Our arguments are untyped! I would argue that the default selector is always select because that's the only inference that can reasonably be made.

If we don't do that, then we're going to end up with this being a valid message which looks like it's doing plural selection:

I see that you're trying to put up guardrails around common internationalization mistakes that developers make, such as "reinventing plurals" or (later in your examples) "inventing their own gender format". Those examples not only look valid---they are valid. It will always be possible to write non-internationalized messages using our formatters, either directly (by abusing tools such as select) or indirectly (by writing your own naive selectors).

I think we should have reasonable guardrails, but I disagree with your "too valid" assertion. The antidote to your examples is making the correct APIs obvious and available and then doing education. Everyone writes the following (generally as if/else code) when they don't know about plural. No one writes this once they know about plural:

match {$count}
when 1 {one}
when * {other}

FWIW, I do think that we might not call select by that name. Calling it :string might make sense. What about :equals?

aphillips commented 12 months ago

Related to #42 ?

mihnita commented 12 months ago

About:

let $countFormat = {$inputCount :numberFormat ... options}
let $countSelector = {$inputCount :numberSelect ... options2} // and have likely failure if options2 ≠ options
match {$countSelector}
when ... {...{$countFormat}....}

I agree it is problematic, but I don't think it can be prevented in syntax only. One can always do:

match {$people_count :plural}
when ... {...{$file_count}....}

I think can only be detected at higher level, something that can have a high level view of the whole message. And even then, there are use cases when that is not wrong:

1: "You are at the last credit!" *: "You still have some credits left in your account (42, to be more precise)"

So we can't safely forbid it in syntax, or throw at runtime, because of false positives All we can safely do is detect in lint.

mihnita commented 12 months ago

I do think that we need to ensure that non-plural number selection is possible, but I do not believe that it's a very common thing.

You owe us $42
We are all set
We owe you $42

I think that building something like that we have to make a decision on the currency amount (a number), but the selector is not a plural, it is a "decide based on the sign of the number" selection.

Maybe not common. But also maybe a chicken-and-egg thing. Might be more common if this would be possible. Or (because it is not possible in MF1 / Fluent / other) people do it in code, so this is not visible in the messages. They have 3 independent messages.

Another example: showing different things depending on the day of week

5 : closed
6 : closed
* : open

The day of week can be an enum, but a lot of APIs use an int for that. It can be "faked" with plural and exact matches (=5 =6 *), but that feels wrong. It is not a plural select.

Or can force developers to convert that number to a string, meaning a buffer allocation, and int-to-string conversion, only to be able to do a textual select.

The most natural thing is to select on that number.

mihnita commented 12 months ago

That would require introspection of the variant keys to determine how selection happens. I am not proposing for that to happen, and I don't think we should make that possible.

Agree. But my comment was based on this :

match {$foo}
when one {{$foo} foo}
when * {{$foo} foos}

should formatting it to a string with { foo: 1 } result in 1 foo or 1 foos?

If I add a when pink {{$foo} foox} case, there is no way to detect that as an error in a lint. Because I don't have access at the type of $foo.

Worse, I don't even know if I have to expand this for the plural cases, for instance in a CAT tool. It is a select on $foo, that is clear. But is the type of $foo numeric, so this is a plural, so I have to add few and many for Russian? Or is foo a string, or an enum, so the selector function is ... something else, I have no clue.

So I can't validate anything, in lint, or in CAT tools.

mihnita commented 12 months ago

If so, should we also always require a function for placeholders to avoid such confusion?

I am not 100% against, and it would be cleaner, and with more chances for validations. But early on (many months ago) Mark said he would not like that. And I also agree with that :-) People are used to placeholders without type. MF1 does that, the MF in the JDK does that, a lot of programming languages do it (python, perl, php, js, shell script) https://en.wikipedia.org/wiki/String_interpolation

It is pretty much everywhere.

I agree that this would make things inconsistent: selects (with forced function) and formatting placeholders (with a default function decided by type)

But a human can usually translate something with a placeholder of unknown type. Selectors are more problematic though.

If all I see is this:

match {$foo}
when * {{$foo} foos}

And I don't know the type of foo, I don't know (as a human translator, or as a CAT tool, or a lint tool) if I have to add gendered forms, or plural keywords, or grammatical cases, or something else (sorry, same argument as above, in a different form).

I don't think I follow how using the same MF2 keyword imposes a restriction on implementations. Following your pseudocode example, why wouldn't this also work?

class NumberFormatAndPlural implements Selector, Formatter { ... }
registry.addSelector("number", SelectPlural.class);
registry.addFormatter("number", NumberFormat.class);

Yes, that would also work. But my main problem is with the visible, public signatures. A selector in general is fundamentally a different concept than a formatter.

I can in theory do a plural select on non-numbers, for example on a list (and yes, probably the plural decision would be on the number of items in the list). Or on a int range, and the decision might be on the difference between end and start (yes, still a number, but I force the dev to calculate it and put it in some argument)

TLDR: I can do plural selections on non-numbers, and and I can do non-plural selections on numbers.

You know, going a slight step further, we could simplify the syntax a little and make this mistake even harder to make:

let $count = {$inputCount :number maximumFractionDigits=0} match $count when one {You have {$count} book} when * {You have {$count} books}

I am 150% on board with that, if we add a :plural on the match :-)

And not that I agree now, I coded it that way one year ago: https://github.com/unicode-org/icu/blob/main/icu4j/main/tests/core/src/com/ibm/icu/dev/test/message2/MessageFormat2Test.java#L351

:-)

mihnita commented 12 months ago

I don't feel strongly about :select vs :string vs something else.

But I don't think that using :string (or something else) instead of :select would help with the gender.

I strongly believe that sooner or later we would need an explicit standard :gender selector function. That is why I tried to "sneak it in" the initial version of the registry :-), hoping it is not too controversial. (just kidding, not sneaking it, that is why I added a comment that CLDR was against it, for full disclosure).

A tool (and a human) will instantly know that this:

match $foo :gender
when * {Are you tired?} // which is not gendered in English

needs to add the gender forms required by their language.

With match $foo :string (or match $foo) I don't know.

We can also ask developers to add the gender variants in source, even if the source language (let's say English) does not need them. But then the Chinese translator will also wonder: what am I supposed to do with this gender stuff? This is similar to plural expansion, where we don't force devs to add all plural cases (keywords).

To summarize, I see gender similarly to booleans or enums in programming languages. Early on languages like C used integers for these concepts. But in the end they came around and realized that these are very useful, even if they look like syntactic sugar on top of int.

eemeli commented 12 months ago

@aphillips: I think it is important to be careful of frequency arguments. Strings with replacements are "rare" compared to strings without. String with two selectors are rare compered to those with only one. Etc. I have plenty of examples of non-plural numeric selection that should be handled by the message formatters and not by writing spaghetti logic.

My intent with the frequency argument is to follow a version of the "make the easy jobs easy, without making the hard jobs impossible" dictum with our language design.

The example I use for this is one you can observe in most driving direction applications:

match {$distanceRemaining :plural} {$distanceRemaining :lt value=10}
when =0  *     {You have arrived}
when one true  {You have {$distanceRemaining :number minFractionDigits=1} km to go.}
when one false {You have {$distanceRemaining :integer} km to go.}
when *   true  {You have {$distanceRemaining :number minFractionDigits=1} km to go.}
when *   false {You have {$distanceRemaining :integer} km to go.}

With a :number as proposed in #420 that directly supports plural selection, this could be expressed as:

let $distanceRemaining = {$distanceRemaining :number minimumSignificantDigits=2}
match {$distanceRemaining}
when 0 {You have arrived}
when * {You have {$distanceRemaining} km to go.}

But this is probably a sidetrack from the main thread here.

I agree that this is a potential issue for a message like (c) where the type of the selector is not at all defined in the message. In the $count example, however, we know that it's a :number. Therefore, if we make it explicit and public that selection on a number is handled as plural selection, then that isn't a secret, and tools may generate variants for a target locale as appropriate.

How does a translation tool which only has access to the message resource know that $count is a number? Just because one of the keys is one? Our arguments are untyped! I would argue that the default selector is always select because that's the only inference that can reasonably be made.

We might be referring to different messages here? Before your earlier comment, the only "$count example" is the one from https://github.com/unicode-org/message-format-wg/issues/425#issuecomment-1636715814, where it's presented as:

let $count = {$books :number maximumFractionDigits=0}
match {$count}
when one {You have {$count} book}
when * {You have {$count} books}

There, $count is explicitly declared by a :number expression and no inference is required.

FWIW, I do think that we might not call select by that name. Calling it :string might make sense. What about :equals?

I would prefer :string. With :equals, it's not immediately clear what sort of equality we're talking about.

@mihnita:

I do think that we need to ensure that non-plural number selection is possible, but I do not believe that it's a very common thing.

You owe us $42
We are all set
We owe you $42

I think that building something like that we have to make a decision on the currency amount (a number), but the selector is not a plural, it is a "decide based on the sign of the number" selection.

To make that work, I think you'd need something like a :sign annotation. So it's certainly possible.

Another example: showing different things depending on the day of week

5 : closed
6 : closed
* : open

Is this an appropriate use of match that we should be encouraging? It's certainly possible, but are you actually presenting this as an exemplary message that we should make sure is well presentable in MF2?

I don't think I follow how using the same MF2 keyword imposes a restriction on implementations. Following your pseudocode example, why wouldn't this also work?

class NumberFormatAndPlural implements Selector, Formatter { ... }
registry.addSelector("number", SelectPlural.class);
registry.addFormatter("number", NumberFormat.class);

Yes, that would also work. But my main problem is with the visible, public signatures. A selector in general is fundamentally a different concept than a formatter.

I think I'm still missing something here. What do you mean by "the visible, public signatures"?

You know, going a slight step further, we could simplify the syntax a little and make this mistake even harder to make: let $count = {$inputCount :number maximumFractionDigits=0} match $count when one {You have {$count} book} when * {You have {$count} books}

I am 150% on board with that, if we add a :plural on the match :-)

I'm pretty sure that the only way we could drop the braces from match statements would be if they can't have an annotation. It's really hard to see what are the selectors otherwise, for example with match $one :two :three.

I don't feel strongly about :select vs :string vs something else.

But I don't think that using :string (or something else) instead of :select would help with the gender.

[...] A tool (and a human) will instantly know that this:

match $foo :gender
when * {Are you tired?} // which is not gendered in English

needs to add the gender forms required by their language.

With match $foo :string (or match $foo) I don't know.

That's exactly my point! We want to encourage the development of of explicitly clear (grammatical or personal, not sure which) :gender selectors. Let's say we have a developer writing a message against an MF2 runtime which does not have a :gender, but they need to express it anyway. With :select, they end up with

match {$foo :select}
when * {Are you tired?}

But with :string, they end up with

match {$foo :string}
when * {Are you tired?}

In neither case would a translator or their tooling get any info from the above about what the possible keys might be, but with :string the developer would feel much less comfortable with the result. It works, but it's ugly, and they'll need to make sure to include a verbose translator comment about what to do here, or maybe they'll be inspired to write a proper :gender selector. With :select, they might well end up walking away with the presumption that "that's just how selection works in MF2".

aphillips commented 12 months ago

@mihnita

But I don't think that using :string (or something else) instead of :select would help with the gender.

Nothing helps with gender except a gender selector. Everything else will be people hacking something together based on their (probably insufficient) thinking about it. This is like the way people used ChoiceFormat before we had PluralFormat.

What select/string does provide is a generic selector that can match string values which are enumerated in the variants rather than in the registry. This is a powerful tool and it will be abused.

To summarize, I see gender similarly to booleans or enums in programming languages. Early on languages like C used integers for these concepts. But in the end they came around and realized that these are very useful, even if they look like syntactic sugar on top of int.

Full agreement, but the registry is fairly static. Rather than look at the bad examples, let's look at valid ones:

match ($productType}
when radio {We offer installation for your radio.}
when battery {We offer installation for your battery.}
when tire {We offer tire installation.} // can add more categories to the message without modifying code
when * {Installation is not available} 

Just like an enum, the match/when allows the developer to enumerate without having to install something.

The other thing about select that "just works" is that one can select numeric, boolean, or maybe even date/time values either directly or via a perversion of toString()

aphillips commented 12 months ago

@eemeli

With a :number as proposed in https://github.com/unicode-org/message-format-wg/pull/420 that directly supports plural selection, this could be expressed as:

let $distanceRemaining = {$distanceRemaining :number minimumSignificantDigits=2}
match {$distanceRemaining}
when 0 {You have arrived}
when * {You have {$distanceRemaining} km to go.}

No, actually, that's not the same thing. In my example, the message morphs from:

"You have 11 km to go" ... to... "You have 9.6 km to go" <- values under 10 show 10th of a unit

Either you need a "less than 10 message" or you need a "less than 10 selector". Make sense?


I proposed :equals as a selector because with strings is it about string equality, but it could be used with other types in implementations that grok types. So the equality is a bit fuzzy and it isn't "object equality" (pointer equality).

Another alternative that occurs to me from @mihnita is :enum, i.e. that the selector matches values enumerated in the variant keys?

I think it is important that there be a selector that developers can use to perform selection against data values without going to the registry.


To make that work, I think you'd need something like a :sign annotation. So it's certainly possible.

I think I'd like to propose the classical comparison operators for the default registry?

Selector Description
:eq Value equality
:lt Less than
:gt Greater than
:le Less than or equal to
:ge Greater than or equal to
:ne Not equals

In the PR you noted that requiring an annotation in the syntax (my "option b") would require a separate expression processor, which is true but could probably be mitigated. My other option was to have a defined default for the selector function when it isn't specified. The option you present is my option "b" but with the allowance that the selector can be declared in an declaration. That is, this message is not valid:

match {$foo}
when * {doesn't matter}

... but this one is:

let $foo = {$foo :selector}
match {$foo}
when * {doesn't matter}

That's a lot of state to be carrying for each declaration (in case it is used later in a match). I think it is better if it is a simple syntax error. Is it that onerous to require the annotation in the expression when it is in a match statement?

Anyway, I think we have three options:

(a) there is a default selector and it is :string (or whatever name) (b) the annotation is required directly in the selector expression (c) the annotation is required directly in the selector expression or indirectly via a declaration

Are there any others?

mihnita commented 12 months ago
5 : closed
6 : closed
* : open

Is this an appropriate use of match that we should be encouraging? It's certainly possible, but are you actually presenting this as an exemplary message that we should make sure is well presentable in MF2?

I think it should be easy to implement a custom function doing that. Which would be some kind of "numeric select". Think of the messages as "the store is open / closed"

To make that work, I think you'd need something like a :sign annotation. So it's certainly possible.

Yes, that would be a custom selector function. Which works on numbers, same as the :plural. Then it is confusing that :number is kind of sort of like a type, and a formatting function, and a selection function.

But the main point of the argument is: there are reasonable non-plural selector functions that work on numbers.

I think I'm still missing something here. What do you mean by "the visible, public signatures"?

The stuff that I see as a developer. What I see as a developer. Not implementation, but the message that I read of write. And when I see match {$count :number} and then You deleted {$count :number}... I scratch my head and I don't know what kind of a thing :number is. Almost like seeing:

switch (foo) {
    case 1: .... switch( bar );
    break;
}

The two switches are conceptually different. And we confuse people by calling them the same.

There, $count is explicitly declared by a :number expression and no inference is required.

Yes, but that only moves the requirement to always have function in selector (what I suggest) to always declare a local variable, and always have a function in that declaration. Which is harder (impossible?) to enforce in syntax.

What I mean is: it's easy to force in syntax that match {$foo :function} always has :function present. It is not easy to force in syntax: if the message has a match {$foo}, then it is mandatory to also have a let $foo {$bar :number} somewhere above. And also means we can't do plural decision directly on arguments, only on local variables (taking us back to shadowing :-)

We want to encourage the development of of explicitly clear (grammatical or personal, not sure which) :gender selectors

Than why not also encourage them to use :plural, not use some guessing based on type :-)

but with :string the developer would feel much less comfortable with the result.

Probably not :-) My "developer persona" adds 1-3 messages per week. One in 10 message (or even less often) might have a plural or select or something more complex. So maybe once per month, or even less often, they will see the :select / :string. And the reaction will be "weird name, but whatever, it is what it is". Similar to "Ah, Rust does not have a switch, it's called match. Whatever..."

They don't spend time in it, they probably never read the spec. If they go to the spec it means that we forced them, but doing something unexpected.

let $count = {$inputCount :number maximumFractionDigits=0}
match $count
when one {You have {$count} book}
when * {You have {$count} books}

I am 150% on board with that, if we add a :plural on the match :-) ... I'm pretty sure that the only way we could drop the braces from match statements

That was not the point I was trying to make. I just missed the brackets, should be match {$count} (I don't want to go back and edit the comment, as that would be misleading for someone reading in the future. But consider that I did.

The point I was trying to make is that I am on board with let + match (which you proposed a bit before), if also make the selection function mandatory in match.

let $count = {$inputCount :number maximumFractionDigits=0}
match {$count :plural}
...

But I should not be FORCED to declare that local variable. Today in MF1 I can do this:

{$count, plural,
    one {You have # book}
  other {You have # books}
}

This is close enough, although a bit more verbose, it is still an almost 1:1 mapping:

match {$inputCount :plural}
  when one {You have {$inputCount} book}
  when * {You have {$inputCount} books}

This feels like too much "ceremony" to get the same result:

let $count = {$inputCount :number maximumFractionDigits=0}
match $count
  when one {You have {$count} book}
  when * {You have {$count} books}
mihnita commented 12 months ago

I proposed :equals as a selector because with strings is it about string equality, but it could be used with other types in implementations that grok types. So the equality is a bit fuzzy and it isn't "object equality" (pointer equality).

:equals sounds good to me. Many languages have the concept of "equal", or some kind of equal interface, because it is used to put things in sets, or use as keys in a map. Some native (like Java) or C++ (with the == operator), std:cmp:Eq in Rust.

In some languages it would also be trivial to make it work for enums.

I think I'd like to propose the classical comparison operators for the default registry?

:-D I don't know. Might be a bit too early? Let's try to chew the part we already got :-)

I was think (even the more controversial) :choice:

match {$foo :choice}
  0 {are no files}
  1  {is one file}
  |<10| {a bunch of files}
  * {countless files}
}

Not saying it is a good idea. But there might be some similar functionality that is good i18n, so it's good that it's possible to implement.

mihnita commented 12 months ago

I think your summary is right on:

(a) there is a default selector and it is :string (or whatever name) (b) the annotation is required directly in the selector expression (c) the annotation is required directly in the selector expression or indirectly via a declaration

I think (c) is harder (impossible) to enforce in syntax, would require a linter.

It is is not in syntax it is also harder to detect / highlight as a problem in editors that offer "real time error detection". (Like Visual Studio Code, which works very nicely with the extension that Eemeli created, btw. Thank you!)

So I'm OK with (b), and (b) + (a) (function required, if not then :string / :equal is implied)

With a slight preference for (b) without (a), because (a) means that the syntax / engine rendering MF2 has to know about the special method :equal that is declared in the registry. And my thought was that MF2 should be registry independent.

Solvable, but a bit unclean.

macchiati commented 12 months ago

Unfortunately github doesn't have threads, which would make this easier to follow!

  1. As I've said before
    
    The example I use for this is one you can observe in most driving direction applications:

match {$distanceRemaining :plural} {$distanceRemaining :lt value=10} ...


is a terrible example, because it assumes that all languages measure distances the same way. There are probably examples illustrating :lt, but this isn't a good one to use.

2.
>Nothing helps with gender except a gender selector. Everything else will be people hacking something together based on their (probably insufficient) thinking about it.

Firmly agree!
aphillips commented 12 months ago

There are probably examples illustrating :lt, but this isn't a good one to use.

I left out the measure units part of the example to focus on number selection and formatting. FWIW, I didn't recreate the full example here because I thought it would be too baroque (it is also the example of an MF1 "very complicated message" that I used in my IUC presentation calling for the creation of this WG--that version selected independently of locale between SI and customary units)...... 😛

@mihnita

And my thought was that MF2 should be registry independent.

MF2 can be registry independent in terms of well-formed messages, but cannot be fully independent if we normatively require a default registry (will we?). I think it is reasonable in the formatting part of the spec to require:

If an expression in a match_statement does not include an annotation, then make the selector function be the equals function.

I could be okay with (b). My goal in enumerating the options was to ensure that all of them were in one place for a discussion (perhaps Monday).

macchiati commented 12 months ago

The downside of that is that it is fragile. Take the following:

match {$count} when one {...} when * {...}

If someone passes in an integer with value 1, this will fail. I think it would be better to require an an annotation somewhere, that is, either:

A. match {$count :xxx} or B. let $count = {$yyy :xxx} match {$count}

stasm commented 12 months ago

Requiring an annotation in the selector (like #431 proposes) doesn't prevent errors caused by using non-matching functions.

# Assume :titlecase is a format-only function.
match {$count :titlecase} when ...

Similarly, we can't prevent errors due to matching functions being used inside placeholders:

# Assume :plural is a match-only function.
{Hello, {$world :plural}.}

We'll need to handle such cases on runtime anyways. I'd like to propose that we design the behavior of formatting and selection around the concept of wrapper interfaces around the passed values. If someone passes in an integer 1, then on runtime the implementation should represent it as interface MessageFormatNumber extends MessageFormatValue, where:

interface MessageFormatValue {
    formatToString(): string;
    formatToParts(): Iterator<MessageFormatPart>;
    match(): boolean;
}

Crucially, all functions would be required to return an instance of MessageFormatValue, too.

This way, in all of {{$count}}, {{$count :number}}, match {$count}, and match {$count :plural}, we expect the expressions to resolve to instances of MessageFormatValue. (This is related to #299 and #413.)


The reason why I think it's important to talk about this now is that it seems like we're trying to add some sort of static typing to arguments flowing through messages, but just to selectors, because we also want placeholders to be typed dynamically (and loosely).

And because messages are not compiled, the only place to enforce static typing is during parsing: either as a SyntaxError or a DataModelError. But even that isn't perfect, because we don't know what annotations do before we call them.

On top of that, the presence of local declarations makes static typing via SyntaxError impossible. I'd expect the following message to be valid and work just fine:

let $foo = {$count :plural}
let $bar = {$foo}
let $baz = {$bar}
match {$baz} ...

...while also I'd expect it to not require :plural in the selector.

So we're left with runtime DataModelErrors which, to reiterate my point from the beginning of this comment, won't protect from mis-uses of functions, e.g.

# Assume :titlecase is a format-only function.
let $foo = {$count :titlecase}
let $bar = {$foo}
let $baz = {$bar}
match {$baz} ...

My position then is to:

eemeli commented 12 months ago

@aphillips: No, actually, that's not the same thing. In my example, the message morphs from:

"You have 11 km to go" ... to... "You have 9.6 km to go" <- values under 10 show 10th of a unit

Either you need a "less than 10 message" or you need a "less than 10 selector". Make sense?

Yes. There's more to quibble about here, but it's besides the point.

I think it is important that there be a selector that developers can use to perform selection against data values without going to the registry.

I agree. I just want the developer to feel a little dirty when doing that.

I think I'd like to propose the classical comparison operators for the default registry? Selector Description :eq Value equality :lt Less than :gt Greater than :le Less than or equal to :ge Greater than or equal to :ne Not equals

This would encourage packaging logic into messages, rather than the calling code. Do we really want to do that? I would much rather add a :boolean instead.

[...] I think it is better if it is a simple syntax error. Is it that onerous to require the annotation in the expression when it is in a match statement?

Yes, actually. Doing so would strongly guide a user to consider the annotation of a selector as a "select function" rather than as something like a constructor. This is related to what we're asking our users to consider to be the "resolved value" of an expression. For now, a user can consider an expression just by itself, irrespective of where in a message it shows up. If we syntactically require an annotation on every selector, then they'll need to consider where an expression is before considering its meaning.

To use :number as an example, my preferred way of thinking about it is that an expression with that annotation will give me something like a MessageNumber object which encapsulates a numerical value and a bag of options. This MessageNumber may then be used either as a selector or as a placeholder. As a selector, it lets me ask for each variant, "Does this exact value or plural category match this number, when formatted with these options?". When used as a placeholder, formatting it answers the question, "What does this number look like when formatted with these options?".

@mihnita:

@eemeli:

5 : closed
6 : closed
* : open

Is this an appropriate use of match that we should be encouraging? It's certainly possible, but are you actually presenting this as an exemplary message that we should make sure is well presentable in MF2?

I think it should be easy to implement a custom function doing that. Which would be some kind of "numeric select". Think of the messages as "the store is open / closed"

My concern here is that this message lets a localiser control whether the shop is "open" or "closed" on a Saturday. I would think that we'd prefer something like:

match {$isOpen :boolean}
when true {open}
when * {closed}

or even two entirely separate messages.

But the main point of the argument is: there are reasonable non-plural selector functions that work on numbers.

Yes, they need to be supported as non-default selectors. My point is that when a number is used as a selector, it is reasonable to assume by default that it's being used for plural selection.

I think I'm still missing something here. What do you mean by "the visible, public signatures"?

The stuff that I see as a developer. What I see as a developer. Not implementation, but the message that I read of write. And when I see match {$count :number} and then You deleted {$count :number}... I scratch my head and I don't know what kind of a thing :number is. Almost like seeing:

switch (foo) {
    case 1: .... switch( bar );
    break;
}

The two switches are conceptually different. And we confuse people by calling them the same.

With Fluent, this has not appeared as a concern. In that format, a NUMBER() may be used as either a selector or a placeholder, and our developers or translators don't appear to have any issue with that.

There, $count is explicitly declared by a :number expression and no inference is required.

Yes, but that only moves the requirement to always have function in selector (what I suggest) to always declare a local variable, and always have a function in that declaration. Which is harder (impossible?) to enforce in syntax.

In my PR #431, I propose it as a data model error. We already have a few, comparing the selector and variant key counts. This has comparable complexity with those.

This feels like too much "ceremony" to get the same result:

let $count = {$inputCount :number maximumFractionDigits=0}
match $count
  when one {You have {$count} book}
  when * {You have {$count} books}

What would be your preferred alternative for the above? With out current syntax, it ought to have match {$count}, but otherwise it's exactly how I would think this message should be structured. How could we reduce the "ceremony" here?

mihnita commented 12 months ago

My concern here is that this message lets a localiser control whether the shop is "open" or "closed" on a Saturday. I would think that we'd prefer something like:

We might allow localization to add options beyond the ones from the source. Or not.

But if you don't trust your localizer, you have bigger problems than that. The source language can say "open" and they can translate as "closed".

We try to help localization (with validation, lints, etc) by preventing non-intentional mistakes.

My point is that when a number is used as a selector, it is reasonable to assume by default that it's being used for plural selection.

I get that. And my take is that number is not something to use as a selector. The selector is a function that looks at a number to make a decision.

let $count = {$inputCount :number maximumFractionDigits=0}
match $count
  when one {You have {$count} book}
  when * {You have {$count} books}

What would be your preferred alternative for the above?

I've shown it more than once:

let $count = {$inputCount :number maximumFractionDigits=0}
match {$count :plural}
  when one {You have {$count} book}
  when * {You have {$count} books}

It is a plural selector, taking a number formatter as an operand.

And it can look at it's parameters, and input, but does not need to invoke it. While {You have {$count} books} has to invoke it (resolve it?) So it is quite different.

Which also allows one to add selectors like match {$count :choice} that also work on numbers. It is confusing that number was a formatter, we kind of make it a type now, and also want to make it a selector.

If we think about {$foo :datetime}, that is not a type, it is a formatter only. It might format (if we think Java types) Date, Calendar, long (epoch time), the new stuff in java.time. And not all of these inputs are even "date-like" (see the long there).

By trying to make :number a type, not a function, it means we can't really describe it in the function registry. And it is not consistent with :datetime or other things that look the same, and are used the same in placeholders and local variables, but are not used the same as selectors. Or types.

stasm commented 12 months ago

The way to look at it is: :number is a constructor function which returns an instance of some runtime type, which wraps around the raw value of $count, stores options, and can format and match. The exact interface of this runtime type is implementation-specific, but I think it will help in our discussions if we admit that it exists.

The function is part of the public API; its signature is defined in the registry. The runtime type is private and is not described in the registry. In fact, implementations don't even have to use any runtime types, as long as the behavior is the same as specified.

stasm commented 12 months ago

@mihnita gives an example:

let $count = {$inputCount :number maximumFractionDigits=0}
match {$count :plural}
  when one {You have {$count} book}
  when * {You have {$count} books}

This looks great, and I think it should be how we recommend messages are built. I like it that it demonstrates the need for :plural to inspect the options of $count. This was my goal that I described in #299.

I also believe that using a naked selector should be possible, i.e. the following message should not produce a syntax nor a data model error:

let $count = {$inputCount :number maximumFractionDigits=0}
match {$count}
  when one {You have {$count} book}
  when * {You have {$count} books}

If the :number function provides a matching signature, then this message should produce You have 1 book.

If the :number function is formatting-only, we still need to do something about this message on runtime. The instance of the runtime type that wraps $count should be able to be used as a selector, and most likely should just implement simple string-equality matching. In which case, the output would be You have 1 books.

You have 1 books isn't proper English, so such a situation should be detectable statically. And it is: linting can notice that:

aphillips commented 12 months ago

@eemeli

I mostly agree with your reply above, but...

Yes, they need to be supported as non-default selectors. My point is that when a number is used as a selector, it is reasonable to assume by default that it's being used for plural selection.

I am concerned about having type-based selection in a type-less system:

match {$count}
when * {You have {$count} items}

The above is only a plural selector if $count is a number. I don't know if my translation tool should explode the plural to generate few and many variants unless I know it is a number. The name count doesn't help (translation tools can't read and the name might not be an obvious number anyway). Similarly, there can be selectors that use the keywords reserved by other selectors: when one is not owned by plural exclusively.

In MF1, the selector is always named. Formatters can be inferred from types, but never selectors. This does not mean that we have to require the selector. But how do I know that my variants will work as intended?

This would encourage packaging logic into messages, rather than the calling code. Do we really want to do that? I would much rather add a :boolean instead.

Message selection logic? Absolutely I would want message selection logic in messages. I agree about booleans, but I have found that enumerated sets are super common. When a specific message varies content based on a data value, developers often want to use an enum (which they already have) or dataset to power it:

match {$deviceType}
when phone      {Your phone is ready to use.}
when tv         {Your TV is ready to use.}
when tablet     {Your tablet is ready to use.}
when headphones {Your headphones are ready to use.}
when *          {Your device is ready to use.}

This message avoids the temptation to use MessageFormat to do string concatenation (I won't explain why these are bad messages):

{Your {$deviceType} is ready to use.}
{Your {$deviceType :getDisplayName} is ready to use.}
macchiati commented 12 months ago

The way to look at it is: :number is a constructor function which returns an instance of some runtime type, which wraps around the raw value of $count, stores options, and can format and match.

I agree; that's the way I think of it also. Moreover, I think that instance is what you are passing in to a further 'let'. So the following works just fine.

let $foo = {$count :number} let $bar = {$foo} let $baz = {$bar} match {$baz} ...

On Thu, Jul 20, 2023 at 8:35 AM Stanisław Małolepszy < @.***> wrote:

The way to look at it is: :number is a constructor function which returns an instance of some runtime type, which wraps around the raw value of $count, stores options, and can format and match. The exact interface of this runtime type is implementation-specific, but I think it will help in our discussions if we admit that it exists.

The function is part of the public API; its signature is defined in the registry. The runtime type is private and is not described in the registry. In fact, implementations don't even have to use any runtime types, as long as the behavior is the same as specified.

— Reply to this email directly, view it on GitHub https://github.com/unicode-org/message-format-wg/issues/425#issuecomment-1644149006, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJLEMBLZOC7POUQ6BQYSD3XRFF4RANCNFSM6AAAAAA2LFG2O4 . You are receiving this because you were mentioned.Message ID: @.***>

macchiati commented 12 months ago

I am concerned about having type-based selection in a type-less system:

I think that is a valid concern. So the question is what the behavior of the following

match ($count) { when few {...} when * {...}

The fundamental question is: can an API for language with reflection (where API can determine datatypes) interpret this as matching the "few" case when the value of $count is the integer 3? Taking C as a paradigmatic unreflective language:

int x = 3; formatMessage(patternString, x)

There are a few possibilities:

  1. Yes, it can, and so the results will just be different for Java than for C (for example). Pro: simpler for reflection languages Con: no interop between reflection and non-reflection languages

  2. No, it can't. It must interpret it (since there is no :number) as a string match, and will pick the "*" case. Pro: slightly more complicated for reflection languages; you need to have the :number on count somewhere. Con: interop between reflection and non-reflection languages

  3. We require that the unreflective APIs for formatting MF2 specify datatypes in their argument lists Pro: simpler for reflection languages, maintains interop Con: Hmmm, this needs more thought. This is just a rough stream-of-consciousness take:

    At first I was thinking that this might be an excessive burden for unreflective languages. But C is pretty painful, with printf needing special format specifiers to allow for variable arguments (%s, %d, %f, ...) for it to work at all, and a mismatch will fail badly. A C API is going to need some help with this anyway, because the code that handles :number will need help to determine whether a parameter is a double or integer or ... And I don't think we want to build that into the way that message format works. But it will also need some way to associate the parameter IDs in the message to a particular argument, so C APIs are going to need some help with that as well. My C is very rusty, but it might be something like the following.

Args args; MFaddInt(args, "$count", 3); MFaddString(args, "$deviceType", "phone"); formatMessage(patternString, args);

On Thu, Jul 20, 2023 at 8:47 AM Addison Phillips @.***> wrote:

@eemeli https://github.com/eemeli

I mostly agree with your reply above, but...

Yes, they need to be supported as non-default selectors. My point is that when a number is used as a selector, it is reasonable to assume by default that it's being used for plural selection.

I am concerned about having type-based selection in a type-less system:

match {$count} when * {You have {$count} items}

The above is only a plural selector if $count is a number. I don't know if my translation tool should explode the plural to generate few and many variants unless I know it is a number. The name count doesn't help (translation tools can't read and the name might not be an obvious number anyway). Similarly, there can be selectors that use the keywords reserved by other selectors: when one is not owned by plural exclusively.

In MF1, the selector is always named. Formatters can be inferred from types, but never selectors. This does not mean that we have to require the selector. But how do I know that my variants will work as intended?

This would encourage packaging logic into messages, rather than the calling code. Do we really want to do that? I would much rather add a :boolean instead.

Message selection logic? Absolutely I would want message selection logic in messages. I agree about booleans, but I have found that enumerated sets are super common. When a specific message varies content based on a data value, developers often want to use an enum (which they already have) or dataset to power it:

match {$deviceType} when phone {Your phone is ready to use.} when tv {Your TV is ready to use.} when tablet {Your tablet is ready to use.} when headphones {Your headphones are ready to use.} when * {Your device is ready to use.}

This message avoids the temptation to use MessageFormat to do string concatenation (I won't explain why these are bad messages):

{Your {$deviceType} is ready to use.} {Your {$deviceType :getDisplayName} is ready to use.}

— Reply to this email directly, view it on GitHub https://github.com/unicode-org/message-format-wg/issues/425#issuecomment-1644167644, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJLEMGQWPU6UPZZTJDCREDXRFHKZANCNFSM6AAAAAA2LFG2O4 . You are receiving this because you were mentioned.Message ID: @.***>

stasm commented 12 months ago

FYI, (3) is exactly what the Rust implementation of Fluent does:

let mut args = FluentArgs::new();
args.set("name", FluentValue::from("John"));

// (snip)
let value = bundle.format_pattern(&pattern, Some(&args), &mut errors);
stasm commented 12 months ago

That said, I think that in your example:

match ($count) {
when few {...}
when * {...}

…even when the implementation learns about $count being an integer, we should not select the “few” case.

That would be a job for the :plural selector function. Without it, I’d like $count to resolve to an instance of an implementation-specific runtime type for integers, which can match against literal keys that look like other integers.

Matching against the “few” key requires plural selection, and I agree with the position that @mihnita expressed before, that it should be possible to implement plurals outside the core engine of the implementation. Consequently, we shouldn’t make plural handling the default for naked selectors.

macchiati commented 12 months ago

Sorry, I omitted that it being formatted for a specific locale (eg Czech)

On Thu, Jul 20, 2023 at 11:25 AM Stanisław Małolepszy < @.***> wrote:

That said, I think that in your example:

match ($count) { when few {...} when * {...}

…even when the implementation learns about $count being an integer, we should not select the “few” case.

That would be a job for the :plural selector function. Without it, I’d like $count to resolve to an instance of an implementation-specific runtime type for integers, which can match against literal keys that look like other integers.

Matching against the “few” key requires plural selection, and I agree with the position that @mihnita https://github.com/mihnita expressed before, that it should be possible to implement plurals outside the core engine of the implementation.

— Reply to this email directly, view it on GitHub https://github.com/unicode-org/message-format-wg/issues/425#issuecomment-1644392729, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJLEMCUQZLBHKQETTOLDL3XRFZYLANCNFSM6AAAAAA2LFG2O4 . You are receiving this because you were mentioned.Message ID: @.***>

aphillips commented 12 months ago

@macchiati

The problem is not with the runtime of reflective vs. unreflective languages. The problem is with the translation time when the argument type cannot be known without some sort of assertion present in the body of the message itself. It also does not matter what the calling code does because the calling code is invisible to the message.

Inference of plural for a selector cannot depend solely on the use of plural's reserved keywords unless plural is going to be baked into the specification (and no other selectors allowed to use zero/one/two/few/many as their keywords). Numbers are important, to be sure, but they are not the only types that can be selected against.

macchiati commented 12 months ago

On Thu, Jul 20, 2023, 11:49 Addison Phillips @.***> wrote:

@macchiati https://github.com/macchiati

The problem is not with the runtime of reflective vs. unreflective languages. The problem is with the translation time when the argument type cannot be known without some sort of assertion present in the body of the message itself. It also does not matter what the calling code does because the calling code is invisible to the message.

Good point. That also means inference for a formatter, tho.

Inference of plural for a selector cannot depend solely on the use of plural's reserved keywords unless plural is going to be baked into the specification (and no other formatters allowed to use zero/one/two/few/many as their outputs). Numbers are important, to be sure, but they are not the only types that can be selected against.

I never held that position.

— Reply to this email directly, view it on GitHub https://github.com/unicode-org/message-format-wg/issues/425#issuecomment-1644427716, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJLEMCYEWA6VARTSTNQPETXRF4U5ANCNFSM6AAAAAA2LFG2O4 . You are receiving this because you were mentioned.Message ID: @.***>

aphillips commented 12 months ago

Good point. That also means inference for a formatter, tho.

It does, although formatters generally don't require translation tools to do something different with the placeholder. And implementations are more forgiving of unspecified formatters (for example, reflecting the type).

I never held that position.

No, you didn't. But some readers might think that this is implied by inspecting the variant list. I wanted to call it out.

mihnita commented 12 months ago

Requiring an annotation in the selector (like https://github.com/unicode-org/message-format-wg/pull/431 proposes) doesn't prevent errors caused by using non-matching functions.

// Assume :titlecase is a format-only function. match {$count :titlecase} when ... Similarly, we can't prevent errors due to matching functions being used inside placeholders:

// Assume :plural is a match-only function. {Hello, {$world :plural}.}

I agree that these kind of errors can't be detected based on syntax only. Among others because the syntax does not know what anything about :titlecase or :plural or :mihai. That is info available in the registry.

But a linter would have access to the registry, can detect the errors you describe. No need to get with them to runtime.

We'll need to handle such cases on runtime anyways

Sure, by throwing (or whatever error reporting mechanism is most natural for the implementation). But there there is a good chance to detect statically, on the message, without access to the code.

mihnita commented 12 months ago

About C++ (no runtime type) and Rust (args.set("name", FluentValue::from("John")):

ICU4C handles this very similar to Rust. MessageFormat takes Formatable. Which package the type (Type.kDate , Type.kDouble , Type.kString...) with the value.

And we use the constructors: Formattable(double(3456)), Formattable("Disk"), Formattable(UDate((int32_t)1000000000L), etc.

mihnita commented 12 months ago

In fact I would be more worried about JavaScript and types.

It is relatively easy to mess up the type in JavaScript and pass a string when you think it is an integer, or the other way around.

The chances the $count is a expected to be a number, but at runtime somehow becomes a string, by mistake, are a lot higher than in C/C++/Java/Rust/etc.

So this would return "1 files" if $count is "1" (string) instead of 1 (number):

when {$count}
  match one {{$count} file}
  match * {{$count} files}

Because the real selector function to apply only depends on the runtime type.

stasm commented 12 months ago

@mihnita writes:

I agree that these kind of errors can't be detected based on syntax only. Among others because the syntax does not know what anything about :titlecase or :plural or :mihai. That is info available in the registry.

But a linter would have access to the registry, can detect the errors you describe. No need to get with them to runtime.

Linting is not required. While we will strongly recommend it and we make an effort to make it powerful, we must also accept that some users may not want or be able to perform it. Consequently, the runtime must handle any message that didn't produce syntax or data model errors.

My point above is that making a missing selector annotation a data model error doesn't help in the absence of proper registry-enabled linting. Therefore I claim that naked selectors should be considered well-formed and valid, but we should recommend against them by giving compelling reasons:

aphillips commented 11 months ago

If I'm reading this thread correctly (as I try to summarize for the 2023-08-14 telecon agenda), I would make the following observations:

  1. We now require annotations for all selectors. This means that reflection or type-casting of selectors can't happen? That part of this issue can thus be closed.
  2. Much of the above thread concerns the default selector for a given (default registry) function, notably whether :number defaults to a :plural selector. I think this can be turned into a proposal for the default function registry (see sublist below), which should be an independent and focused task:
    1. enumerate which functions will be in the default registry (this includes solving #433)
    2. for formatting functions, enumerate whether they are also selectors
    3. for formatting functions that are also selectors, enumerate what the default selector is and any subsidiary (:equals) selectors
    4. enumerate which convenience functions (if any) we will provide. For example, will we provide :plural if :number can also do plural selection?

Given the above, is that enough to close this issue in favor of default registry tasks? Is there any specification or syntax change that needs to occur if functions handle this?

aphillips commented 10 months ago

Marking resolve-candidate due to last comment. I think the discussion in this issue is extremely valuable, but want to break things into discrete tasks.