unicode-org / message-format-wg

Developing a standard for localizable message strings
Other
236 stars 34 forks source link

Dynamic References #130

Closed zbraniecki closed 3 weeks ago

zbraniecki commented 4 years ago

Is your feature request related to a problem? Please describe.

There are scenarios where a translation message needs to reference another message which is not statically known.

To illustrate, let's start with a pair of messages and a message that references them:

board-name = Board
dashboard-name = Dashboard

remove-board = Remove { board-name }
remove-dashboard = Remove { dashboard-name }
let msg = api.get(semaphor ? "remove-board" : "remove-dashboard");

This is already possible and well within the scope of regular message references, but it doesn't scale well to scenarios where the referencing is more nested and/or the number of items grows.

For example a computer game may have 10, 100 or 200 monsters and a lot of messages want to reference any of them.

Having to write 3 messages to nest 3 levels deep, or having to write a remove-X message for each X is not sustainable and blows up payload size, maintenance complexity etc.

Describe the solution you'd like

Dynamic references is a concept that allows for the message ID which is to be referenced to be decided at runtime:

board-name = Board
dashboard-name = Dashboard

remove-item = Remove { $item }
let msg = api.get("remove-item", {
  item: MSG_REF(semaphor ? "board-name") : "dashboard-name")
});

or:

monster-dinosaur = Dinosaur
monster-elephant = Elephant
monster-ogre = Ogre

killed-notice = You've been killed by a { $monster }
let msg = api.get("killed-notice", {
  item: MSG_REF(validatedMonsterName)
});

Describe why your solution should shape the standard

I believe that the use cases where the dynamic references are needed are very badly served by workarounds, and if we don't provide a good API for it, users will develop data and code for handling such cases that will be inherently hard to maintain and costly to clean up.

Additional context or examples

There are implication on our decision on this feature for other facets we're considering:

The common workaround an engineer may do today is:

let item = api.get(semaphor ? "board-name" : "dashboard-name");

let msg = api.get("remove-item", {
  item
});

which has multiple issues with it.

Future GUI bindings impact

Lastly, this paradigm is particularly painful for GUI bindings (#118). I understand that we consider #118 to be out of scope in some ways, but I think this issue is a good testbed for how far ahead we want to think and design for.

Assuming l10n bindings for GUI will want to use declarative bindings resolved asynchronously for animation frame, cases where dynamic references are needed are particularly painful, because the user cannot declare ID and MSG_REF as an argument on the UI widget and let the l10n system resolve it.

The user has to fetch the references message, resolve it, and then declaratively define the ID and resolved message as a String argument in the binding.

If we want to allow for localization to be asynchronous, without dynamic references we'd do:

let monster = await api.get(semaphor ? "monster-elephant" : "monster-ogre"); // String

element.setL10n({id: "killed-notice", args: { monster }});

which in case of HTML for example may lead to:

<p l10n-id="killed-notice" l10n-args="{monster: 'Elephant'}"/>

Now not only did it complicate the async code, but it also hardcoded Elephant in the binding as if it was a hardcoded String.

If the system needs to retranslate this UI widget, because user changed locale, it will be able to pull up new killed-notice but will pass the pre-resolved Elephant in the old language. To de-hardcode it, we'd need to write a helper callback to be called on each localization cycle like this:

let monster = await api.get(semaphor ? "monster-elephant" : "monster-ogre"); // String

element.setL10n({id: "killed-notice", args: { monster }});
element.onBeforeL10n(async () => {
  let monster = await api.get(semaphor ? "monster-elephant" : "monster-ogre"); 
  self.setL10nArgs({monster});
});

This will allow the system to update the element's argument before updating the main message leading to proper translation on change, but is pretty complicated to maintain and I'd say a bad developer experience.

With Dynamic References, the user can just do:

let monster = MSG_REF(semaphor ? "monster-elephant" : "monster-ogre");

element.setL10n({id: "killed-notice", args: { monster }});

which in case of HTML for example may lead to:

<p l10n-id="killed-notice" l10n-args="{monster: {type: 'msgref', id: 'monster-elephant'}}"/>

and now the DOM contains all the information to retranslate as needed without any additional information, the declarations are synchronous, the localization can be asynchronous, the state is preserved and the separation of concerns between translation/retranslation and updates is easy.

There is more background info in Fluent issue.

grhoten commented 4 years ago

For us, we have multiple solutions. One of them is yours where you're almost using a hash table. It's highly problematic because you can't pass any context.

We also have the ability to statically reference the message id. You can still have a selector to choose which one you want if there are multiple ones that you want to use. Through statically referencing it, we also have the ability to pass a predefined variable called "context". It has all of the language specific semantic features that you want.

The old way that we did it before we decided not to do that again was the following way.

...
this.case('genitive').otherMessage(true, variable)
...

Within the "otherMessage", you could do it the following way.

...
this.context.case == 'genitive' ? variable + '\'s': variable
...

We decided against this approach in our latest restructuring, since it became too much like programming for the translators. Though we're rethinking this approach because developers would basically work around such restrictions and eliminate all context, which is worse.

My hope is that a solution is made that is flexible enough for developers while not being too much like a programming language that it confuses translators.

eemeli commented 4 years ago

Here's an interesting thought: If we reach a consensus on top-level-only selectors, we will also need to agree to support selectors that take in more than one variable as input. Presuming then that we're going to allow for selectors that match the select style of MessageFormat 1, this implies that:

  1. We choose to allow for hierarchical message structures.
  2. We choose to support dynamic references.
  3. There is no real difference between selector cases and message keys.

Therefore, this issue is really about whether we admit to the above, and make sure that the data model supports rather than hinders these implications.

To see what I mean, consider one of the examples given by @zbraniecki (reformatted as YAML for syntax highlighting, and ignoring the a/an articles):

monster-dinosaur: Dinosaur
monster-elephant: Elephant
monster-ogre: Ogre
killed-notice: You've been killed by a { $monster }

That could be transformed into this single message, which would provide effectively the same API:

message: |
  { $key ->
    [monster-dinosaur] Dinosaur
    [monster-elephant] Elephant
    [monster-ogre] Ogre
    [killed-notice] You've been killed by a { $message(key: $monster) }
    [other] Error: Message not found
  }

The key here is the support for multiple inputs for the selector, because that allows for a process such as the above to be applied repeatedly, instead of just once.

Furthermore, it's pretty clear that monster- is being used as a crutch to namespace messages in a nominally flat structure. Allowing for a proper hierarchy would make it significantly easier for the tooling to be aware what the proper keys are for e.g. dynamic messages, and for missing values to be dealt with better.

To see what I mean by that, consider a reformatting of the same structure, adding a fallback case and [] syntax:

monster-name:
  dinosaur: Dinosaur
  elephant: Elephant
  ogre: Ogre
  other: Monster
killed-notice: You've been killed by a { $monster-name[$monster] }

That's pretty much equivalent to this:

monster-name: |
  { $monster ->
    [dinosaur] Dinosaur
    [elephant] Elephant
    [ogre] Ogre
    [other] Monster
  }
killed-notice: You've been killed by a { $monster-name(monster: $monster) }

Which may again be collapsed into a single message, but this time with a bit more clarity on what sorts of things one might be killed by:

message: |
  { $key, $monster ->
    [monster-name, dinosaur] Dinosaur
    [monster-name, elephant] Elephant
    [monster-name, ogre] Ogre
    [monster-name, other] Monster
    [killed-notice, other] You've been killed by a { $message(key: 'monster-name', monster: $monster) }
    [other, other] Error: Message not found
  }

The conclusion that I at least draw from the above is that the message data model should be a hierarchy of objects, where at each level there's a set of options to choose between, and an optional mapping function like plural() to determine how to transform the input value when selecting between those options.


ps. Writing this reply was a bit of a wild ride. Version 1 had me arguing against dynamic references, version 2 was all about the equivalence between selector cases and message keys; this is version 3. I think I managed to change my own mind pretty fundamentally at least twice during this process.

mihnita commented 3 years ago

I've been digging to understand what you mean by "Dynamic References", since I say I did it, and you say that no, I did not.

Reading the sample code and the comments in this thread I think I found where the mismatch in understanding is. And the code here: https://github.com/unicode-org/message-format-wg/blob/experiments/experiments/data_model/ts_eemeli/data-model-examples.ts

And this is it:

There is no real difference between selector cases and message keys.

I don't think that is correct.

A selector can be completely resolved by the MessageFormat "rendering" part (the call of format) without calling any external component (a "resource manager", or whatever you want to call it)

And translation (in "classical" translation workflows) does not add strings (message keys), it changes existing strings (adding variants to the selector cases)

There is a (big) difference between "render the message foo, with the selector gender=feminine" and "render the message foo/.../feminine"

And it shows in the examples (in that they are wrong :-)

For example the plural in the monsters example is broken.

It mixes "indefinite" and "plural", which are independent concepts. And then uses these two already inconsistent things to build a "real plural" like 'You have killed {monster-count} Elephants in {dungeon-count} dungeons.'

It does not work because "the thing you use for "1 Elephant" (or "one Elephant") is not the same as the one you use for "an Elephant". And the one you use for "elephants" (a generic plural where you don't know the number) is different from the one you use for "4 elephants" or "21 elephants" (which can be different between themselves. The example uses a singular indefinite and a generic plural to build an explicit plural.


I don't think these mistakes are accidental. I think they are encouraged, or at least not discouraged by the data model and by this idea that "There is no real difference between selector cases and message keys"

A real select works like a switch:

switch (color) {
   case red: ....
   case green: ....
   default: ....
}

The message keys work like a sequence of if(s)

if (color == red) ...
else if (color == green) ....
else ....

The second does not give you any safety. You can mix and match things as you want:

if (color == red) ...
else if (color == large) ....
else if (size == windy) ....
else ....

So what I implemented was the solution that correctly solves the linguistic issues: load a messaged with the ID stored in a variable, and passing it the proper selector info to do the right thing in rendering that indirect message.

The distinction is also true in Fluent (https://hacks.mozilla.org/2019/04/fluent-1-0-a-localization-system-for-natural-sounding-translations):

-sync-brand-name = {$case ->
   *[nominative] Konto Firefox
    [genitive] Konta Firefox
    [accusative] Kontem Firefox
}

That is one message with 3 selectors, not 3 independent messages. There are good reasons for that design.

So I would claim that I've implemented the dynamic message + selector, and what Fluent does today: https://github.com/unicode-org/message-format-wg/blob/experiments/experiments/data_model/java_mihai/src/test/java/com/mihnita/mf2/messageformat/DynamicMessageReferenceTest.java

I did not implement the /.../$variableRef1/.../$variableRef2/... Not because it is not possible, but because I misunderstood the requirement, and implemented something that addresses the real use case, not imitating (an incorrect) implementation.

And I argue that the separate message construct is a bad solution to the real use case. I can implement it as a proof that it can be done, though.

zbraniecki commented 2 years ago

As the data model and syntax of MF2 is solidifying, I took, with help from Eemeli, a task of piercing vertically through the layers to attempt to produce a mock of how dynamic references could be implemented in the current MF2 approach.

I'd like to verify with the stakeholders that this approach matches their mental model on runtime relations.

Here's what I see as the most likely way to implement it:

let ctx = {};
let mf = new Intl.MessageFormat(locale, {
  ctx,
});

ctx.monsters = {};

for (let msg of monstersResource) {
  ctx.monsters[msg.id] = msg;
}

let result = mf.format("$monster :message opt=value ...", {
  monster: "monsters.dragon",
});

// or
let result2 = mf.format("$monster", {
  monster: new Intl.MF.VariableTypes.MessageReference("monsters.dragon", opts), // partially resolved value
});

let resMsgMap = {};

// Needed for DOM API
function addResource(res, groupName = null) {
  let root = ctx;
  if (groupName) {
    root = ctx[groupName];
  }
  for (let msg of res) {
    root[msg.id] = msg;
  }

  resMsgMap[res.id] = res.map((msg) => msg.id);
}

// Needed for DOM API
function removeResource(resId) {
  //XXX: handle groups
  for (let id of resMsgMap[resId]) {
    delete ctx[id];
  }
}

This requires a live ctx map to be passed to constructor and MessageReference function to get runtime access to MF2 instance's ctx map.

Does that seem like the plausible experience in line with everyone's thinking?

@eemeli, @stasm, @mihnita , @echeran , @aphillips ?

eemeli commented 2 years ago

While working on the JS implementation, I've also included a compatibility package for Fluent support which maps Fluent term and message references to a custom MESSAGE function that works rather similarly to the above. While Fluent itself doesn't support dynamic references, the runtime function may be used with MF2 syntax to achieve dynamic references:

import { getFluentRuntime } from '@messageformat/fluent'
import { MessageFormat } from 'messageformat'

const res = new Map()
const runtime = getFluentRuntime(res)

res.set('dragon', new MessageFormat('{Dragon}', 'en', { runtime })
res.set('lion', new MessageFormat('{Lion}', 'en', { runtime })

const msg = new MessageFormat('{You see the {$creature :MESSAGE}.}', 'en', { runtime })
msg.resolveMessage({ creature: 'dragon' }).toString()
// 'You see the Dragon.'

The slightly tricky part there is that the MESSAGE handler needs to be provided a way to find other messages while being potentially itself used by those messages; hence the creation and use of res in the definition of runtime, before actually adding the messages to it.

zbraniecki commented 2 years ago

I've been pondering over that as I work on Rust impl of MF2.

The particularly uncommon API design concept here is that you pass a map to getFluentRuntime that is live and can change over time. The closest analogy that comes to my mind is how Live NodeList. Is everyone ok with that analogy and design choice?

The reason this is important to be live is that it will be the mechanism powering DOM lifecycle of adding/removing MF2 resources, analogous to how one can dynamically inject/remove CSS links in a live document.

Second piece that I'd like to call out is that { creature: 'dragon' } is opaque. There's no way to indicate that dragon here is an ID of a message to be referenced, vs a standalone string. I'm a bit iffy on that choice. I think in areas like this it's easier to make mistakes that are hard to debug and ideally, would prefer:

const msg = new Intl.MessageFormat('{You see the {$creature :MESSAGE}.}', 'en', { runtime })
msg.resolveMessage({ creature: Intl.MessageReference('dragon') }).toString()

to make the type explicit, akin to Fluent's Partially Resolved Variables. This also allows for additional options:

const msg = new Intl.MessageFormat('{You see the {$creature :MESSAGE}.}', 'en', { runtime })
msg.resolveMessage({
  creature: Intl.MessageReference('dragon', { onFailed: "creature", mode: "mandatory" })
}).toString()

etc.

For that to work, we need to recognize that arguments have types in host language (JS, Rust, C++ etc.) and that those types have to map onto internal types.

So, passing { emails: 5 } is the same as { emails: Intl.MF2.Number(5) } and we need to decide what types we want to support. Fluent supports strings, numbers and datetime (although badly), and I think we should consider MessageReference, Boolean and possibly others - RelativeTime, List, etc.

eemeli commented 2 years ago

A couple of things come to mind on this, on a few different levels:

  1. With the way we've ended up defining things, this isn't actually an MF2 spec question, but either an implementation question, or a question about a default set of functions (like, say, number) which we would like to define in the spec or in some attached document. So where should we be talking about this? Is it appropriate for this group to also host MF2 implementation discussions, or should we have some separate "MF2 implementers" forum?
  2. I think we need to be careful about the mechanism that we introduce to allow for a MESSAGE formatter as defined above to differentiate between a String('dragon') depending on its origin, i.e. whether it's coming from an explicit literal {(dragon) :MESSAGE}, or if it's coming from a variable reference {$creature :MESSAGE}. Do we really need to have that information be available, or does it count as a nice-to-have? Could we leave it out of the runtime, and complain instead from a linter/validator about the data type?
  3. I rather like the idea of something like Intl.MessageReference('dragon'). To me, this raises the question of whether {$creature} should resolve to the same as {$creature :MESSAGE}; presumably yes? The parallel here would be how {$emails} and {$emails :number} may resolve to the same message, if emails is a number.
aphillips commented 2 months ago

This appears to be out of scope for the official release, save where users can accomplish tasks like this using selectors and local variables or by using custom functions.

Should we keep this alive with Future? Or close?

eemeli commented 2 months ago

This should be closed as out-of-scope because we don't support any message references within the MF2 spec.