zbraniecki / message-format-2.0-rs

MessageFormat 2.0 Prototype in Rust
https://github.com/unicode-org/message-format-wg/issues/93
Other
6 stars 1 forks source link

Selector vs Placeholder #6

Open zbraniecki opened 4 years ago

zbraniecki commented 4 years ago

In https://github.com/zbraniecki/message-format-2.0-rs/issues/2#issuecomment-647538040 @stasm laid our a choice we have about how to handle selection.

In MF1.0 and Fluent, we treat selector as an Expression which can be encoded in a Pattern as Placeable.

That allows us to express following logic (Fluent syntax):

You have { PLURAL($unreadEmails) ->
    [one] one unread email.
   *[other] { $unreadEmails } unread emails.
}

This model is very flexible, but it is less readable than:

{ PLURAL($unreadEmails) ->
    [one] You have one unread email.
   *[other] You have { $unreadEmails} unread emails.
}

and experience from both MF1.0 and Fluent indicates that we want to incentivize and promote complete sentences rather than fragments. The cost is duplication, but we're willing to pay it for higher readability of the message.

This information allows us to explore a particular angle - we can decide to encode variant selection not as part of the Pattern but rather as a top-level Node which contains Pattern elements.

Choices

In practice, based on #2, here are two approaches to AST we can take:

1) Select Expression in Placeable (current Fluent)

pub struct Message {
    pub value: Pattern,
    pub comment: Option<String>,
}

pub struct Pattern {
    pub elements: Vec<PatternElement>,
}

pub enum PatternElement {
    TextElement(String),
    Placeable(Expression),
}

pub struct Variant {
    pub key: VariantKey,
    pub value: Pattern,
    pub default: bool,
}

pub enum VariantKey {
    Identifier(Identifier),
    NumberLiteral(String),
}

pub enum InlineExpression {
    StringLiteral {
        value: String,
    },
    NumberLiteral {
        value: String,
    },
    FunctionReference {
        id: String,
        argument: Option<Identifier>,
    },
    VariableReference {
        id: Identifier,
    },
}

pub struct Identifier {
    pub name: String,
}

pub enum Expression {
    InlineExpression(InlineExpression),
    SelectExpression {
        selector: InlineExpression,
        variants: Vec<Variant>,
    },
}

2) Top-level Select (Disallows Select in Placeable)

pub struct Message {
    pub value: MessageValue,
    pub comment: Option<String>,
}

pub enum MessageValue {
    Single(Pattern),
    Multi(Select),
}

pub struct Select {
    pub selector: Option<InlineExpression>,
    pub variants: Vec<Variant>
}

pub struct Pattern {
    pub elements: Vec<PatternElement>,
}

pub enum PatternElement {
    TextElement(String),
    Placeable(InlineExpression),
}

pub struct Variant {
    pub key: VariantKey,
    pub value: Pattern,
    pub default: bool,
}

pub enum VariantKey {
    Identifier(Identifier),
    NumberLiteral(String),
}

pub enum InlineExpression {
    StringLiteral {
        value: String,
    },
    NumberLiteral {
        value: String,
    },
    FunctionReference {
        id: String,
        argument: Option<Identifier>,
    },
    VariableReference {
        id: Identifier,
    },
}

pub struct Identifier {
    pub name: String,
}

3) A mix of the two above

There's also a third one, which encompasses those two together:

pub struct Message {
    pub value: MessageValue,
    pub comment: Option<String>,
}

pub enum MessageValue {
    Single(Pattern),
    Multi(SelectExpression),
}

pub struct Pattern {
    pub elements: Vec<PatternElement>,
}

pub enum PatternElement {
    TextElement(String),
    Placeable(InlineExpression), // Or `Expression`
}

pub struct Variant {
    pub key: VariantKey,
    pub value: Pattern,
    pub default: bool,
}

pub enum VariantKey {
    Identifier(Identifier),
    NumberLiteral(String),
}

pub enum InlineExpression {
    StringLiteral {
        value: String,
    },
    NumberLiteral {
        value: String,
    },
    FunctionReference {
        id: String,
        argument: Option<Identifier>,
    },
    VariableReference {
        id: Identifier,
    },
}

pub struct Identifier {
    pub name: String,
}

pub enum Expression {
    Inline(InlineExpression),
    Select(SelectExpression),
}

pub struct SelectExpression {
    selector: InlineExpression,
    variants: Vec<Variant>,
}

The third model allows us to encode the SelectExpression as a MessageValue, but enables us to extend the Placeable to store InlineExpression or full Expression letting us relatively easily extend the support to handle selects in placeables.

Questions

I have two questions about this model:

1) Do we want to lock ourselves out of ability to have select expressions in patterns

There's a subtle, but important, difference between best practice and enforcing something via AST.

We have many ways to encourage, and even enforce best practice on the higher level, without designing an AST that limits us to some choice.

In particular, the current Fluent AST can be thought of a super set of what the best practice recommends. It allows us to encode selection as a top level model, while leaving a window open for the AST to encode a select expression also as a pattern.

The third option I listed here would allow us to focus on the best practice model, and leave ability to handle the more flexible model if we find a use for it.

We could even already encode it as such in AST, and just reject at runtime/tooling with options allowing projects to decide on what they want to allow and disallow.

2) How to handle multi-variants

One of the main reasons the (1) model is tempting, is because it handles multiple variants well. We know multi-variants are rare, but it's hard to evaluate if they're rare because UIs just don't need them, or they're rare because localization systems and tooling didn't raise to the challenge to make them possible to use yet. In particular, it's hard to reason about how the UIs will look like 5-10 years from now, and making decisions today about the needs of tomorrow is always risky.

So, while handling of a single-variant message pattern is relatively simple both in (1) and (2) models (and, of course, (3)), when we encounter a message with multiple selectors, like [PLURAL, PLURAL, GENDER], things become much more tricky. They can be nested, or sequenced, and flattening is not trivial, especially if variants don't overlap:

Example: 10 friends from 2 countries liked her profile.:

{ PLURAL($friendsNum) ->
    [one] { $friendsNum } friend
   *[other] { $friendsNum} friends
} from { PLURAL($countriesNum) ->
    [one] { $countriesNum} country
   *[other] { $countriesNum} countries
} liked { GENDER($user) ->
    [masculine] his
    [feminine] her
   *[other] their
} profile.

Example: 5 barbarians ran away.

In a game with 100 creature types we will need to adapt the ran away depending on the gender of the creature and Polish has different genders for singular and plural forms.

{ PLURAL($count) ->
    [one] { $count } { $creature } { GENDER($creature) ->
        [masculine] uciekł.
        [feminine] uciekła.
       *[other] uciekło.
    }
    [few] { $count } { $creature } { GENDER($creature) ->
        [masculine] uciekli.
       *[other] uciekły.
    }
    [many] { $count } { $creature } { GENDER($creature) ->
        [masculine] uciekłi.
       *[other] uciekły.
    }
   *[other] { $count } { $creature } { GENDER($creature) ->
        [masculine] uciekło.
       *[other] uciekły.
    }

We were conceptualizing flattening that in https://github.com/projectfluent/fluent/issues/4 but it's non-trivial design task, especially in scenarios where the number of sub-variants is unequal between top-level variants, and marking "default" may be done as one per all-variants, or one per level.

Thoughts?

stasm commented 4 years ago

The cost is duplication, but we're willing to pay it for higher readability of the message.

That's a great way to put it, thanks.

1) Do we want to lock ourselves out of ability to have select expressions in patterns

I'll reference the goal of rapid prototyping again. We're not locking ourselves out of anything by rapid prototyping. If we do, then we backtrack and choose the other option :) There's no obligation to stick to a decision in this phase.

Since we already have some experience with option 1 from MessageFormat and Fluent, I'd be interested in exploring option 2, and trying to answer the questions you asked in your comment. It might also be a good idea to do a series of different experiments, each dedicated to one of the options, and evaluate them against a few examples.

zbraniecki commented 4 years ago

Since we already have some experience with option 1 from MessageFormat and Fluent, I'd be interested in exploring option 2, and trying to answer the questions you asked in your comment.

Does it mean you are not interested in Option 3, or just that it's a failsafe from Option 2?

I'm personally interested in Option 3 tbh :)

mihnita commented 4 years ago

I am more on the option 2 side, "Top-level Select (Disallows Select in Placeable)"

Consistent with "Ban selection inside the message" in my (ancient) "MessageFormat-like functionality - Random thoughts" :-)

stasm commented 4 years ago

I think we should start by defining a single way of encoding the data into the data model. Thus, my preference would be to stick to Option 2 right now. Option 3 gives us two ways of expressing the same data and I don't see much value in having that discussion right now. It would be more of a distraction.

stasm commented 4 years ago

Rereading my comment, I think I was being too terse. Sorry about that. I'd like to point out that the question of whether we should prefer the "only one way to do something" vs. "multiple valid ways to do the same thing" is a great topic for a design principle discussion! And for the rapid prototyping phase, I'd recommend choosing the simpler model; I suspect we'll have enough things to discuss about the early prototype anyways.