zbraniecki / message-format-2.0-rs

MessageFormat 2.0 Prototype in Rust
https://github.com/unicode-org/message-format-wg/issues/93
Other
6 stars 1 forks source link

Uneven branches in multi-variants #9

Open zbraniecki opened 4 years ago

zbraniecki commented 4 years ago

As per @stasm request in https://github.com/zbraniecki/message-format-2.0-rs/issues/6 I encoded the AST in my proposal to use Option 2.

It handles multi-variant like Anne published 2 pictures. - where in Polish we'll need gender and plural selector.

The issue I see with Option 2 is that I'm not sure how to resolve uneven selectors, For example, if we'd like to extend the example to handle Anne and John published 2 pictures and Anne published 2 pictures, in Polish we'll have to handle the fact that Polish has different genders depending on the plural form of the subject.

In Fluent's proposal we would handle that via nesting:

// John arrived.
// John and Amy arrived.
key = { PLURAL($userNames) ->
    [one] { GENDER($users) -> {
        [masculine] { LIST($userNames) } przyszedł
        [feminine] { LIST($userNames) } przyszła
       *[neuter] { LIST($userNames) } przyszło
    }
   *[other] { GENDER($users) -> {
        [masculine-personal] { LIST($userNames) } przyszli
       *[non-masculine-personal] { LIST($userNames) } przyszły
    }
}

As you can see it is fairly easy to encode the idea of "default" variants and uneven branches.

With Option 2, it becomes more tricky:

key = { PLURAL($userNames), GENDER($users) ->
    [one, masculine] { LIST($userNames) } przyszedł
    [one, feminine] { LIST($userNames) } przyszła
    [one, neuter] { LIST($userNames) } przyszło
    [other, masculine-personal] { LIST($userNames) } przyszli
    [other, non-masculine-personal] { LIST($userNames) } przyszły
}

we can encode it via a single "default":

key = { PLURAL($userNames), GENDER($users) ->
    [one, masculine] { LIST($userNames) } przyszedł
    [one, feminine] { LIST($userNames) } przyszła
    [one, neuter] { LIST($userNames) } przyszło
    [other, masculine-personal] { LIST($userNames) } przyszli
   *[other, non-masculine-personal] { LIST($userNames) } przyszły
}

which is limiting because we may resolve the plural perfectly and only struggle with gender.

Alternatively, we may have default per selector:

key = { PLURAL($userNames), GENDER($users) ->
    [one, masculine] { LIST($userNames) } przyszedł
    [one, feminine] { LIST($userNames) } przyszła
    [one, *neuter] { LIST($userNames) } przyszło
    [*other, masculine-personal] { LIST($userNames) } przyszli
    [*other, *non-masculine-personal] { LIST($userNames) } przyszły
}

but that looks clunky.

There may be some other way to encode what are the defaults, like separately denote defaults, but they seem increasingly clunky to encode in human readable and consistent way.

I'm opening this issue with three thoughts:

mihnita commented 4 years ago

In MessageFormat other means default. Trouble is, this is not limited to MessageFormat, it goes all the way to Plurals, and CLDR: https://github.com/unicode-org/cldr/blob/master/common/supplemental/plurals.xml

You can see that the other entries only have examples, no rules. That's be cause "if none of the rules apply, then return other"


So if we think of this as a switch:

switch (getPlural(locale, count)) {
   case one: ...
   case few: ...
   ...
   default: ... // this is the same as case "other"
}

I think it is a good thing to have one and only one value as fallback, and that should be as generic as possible (covering all options)

  1. Using * would mean that translators should be the ones moving it around, depending what their language prefers (some languages default to neuter, some to masculine, etc.) That can mess up localization tools, leveraging, and adds extra complexity for translators.

  2. Does not match the "mental model" a programmer has about "the world":

    switch (PLURAL($userNames)) {
    [one] ...
    [few] ...
    *[many] ... // WAT? https://www.destroyallsoftware.com/talks/wat :-)
    [other] ...
    }

    Which one is the default now? "many", because the * says so, or "other", because CLDR (which is Unicode) says so? Remember: "other" in CLDR plural means the same thing as "default" in programming languages switch


TLDR: I'm trying to make a case for:

key = { PLURAL($userNames), GENDER($users) ->
    [one, masculine] ...
    [one, feminine] ...
    [one, other] ...
    [other, masculine] ...
    [other, other] ... // This is the default, and the only default
}