tc39 / ecma402

Status, process, and documents for ECMA 402
https://tc39.es/ecma402/
Other
540 stars 108 forks source link

Merge PluralRules into NumberFormat (formatSelect) #397

Open sffc opened 4 years ago

sffc commented 4 years ago

Time and time again, programmers are confused about how to use Intl.PluralRules, especially in ways that relate to rendered digits, like how to take the plural form of 1 versus 1.00 versus 1K.

In the ICU implementation, to solve this problem, we allow users to pass a FormattedNumber, the output of NumberFormatter, into PluralRules.

Here's a draft of how this could look in ECMAScript:

const fmt = new Intl.NumberFormat("fr-FR", {
    notation: "compact"
});
const { string, pluralForm } = fmt.formatSelect(2.5e6);
console.log(string, pluralForm);
// "2.5 M" many

Intl.PluralRules would still be useful for the case where you don't care about the rendered output, but the new API on Intl.NumberFormat would help clarify how to get the effective plural form for a formatted number.

The new APIs:

Thoughts?

@zbraniecki @echeran @longlho

zbraniecki commented 4 years ago

Could we instead offer an ability to pass a NumberFormat instance to PluralRules.select?

const nf = new Intl.NumberFormat("fr-FR", {
    notation: "compact"
});
const pr = new Intl.PluralRules("fr-FR");
pr.select(2.5e6, nf); // select using the number formatted with `nf` options?
sffc commented 4 years ago

The current pattern on how to do this is to pass NumberFormat options into the PluralRules constructor:

const nf = new Intl.NumberFormat("fr-FR", {
    notation: "compact"
});
const pr = new Intl.PluralRules("fr-FR", nf.resolvedOptions());

I was thinking that putting sugar methods on NumberFormat might make it easier to use. It would also make it more efficient because implementation-wise, you only format the number once, and then you compute both the string and the plural form at the same time.

echeran commented 4 years ago

In ICU, we first get back a FormattedNumber from NumberFormat as an intermediate output, then we get its string representation and/or use it to select the plural rule. Do we not have that in ES (only have a string output), and thus want to consolidate the cognitive overhead of the APIs?

If so, I think the idea makes sense. We don't seem to really create custom plural rules -- we take whatever comes by default from CLDR, which means selecting a plural rule has the same input data as what it takes to format a number. And I assume that this proposal just solves the case where you want both; otherwise, you can reuse existing APIs.

On closer look at the current way to create plural rules, it does seem a little wonky when compared to the ICU way of doing things. But I think that matters right now to the extent that we have large use cases of plural rules selection only (w/o formatting) vs. formatting (w/o plural rules selection) or both.

rxaviers commented 4 years ago

Clarifying the issue for potential readers... There are two problems:

1:

new Intl.PluralRules("mk").select(1)
// > "one"
new Intl.PluralRules("mk", {minimumFractionDigits: 1}).select(1)
// > "other"
sffc commented 4 years ago

2.3e6 in fr-FR: "2 300 000 vues" (plural form "other")

But when compact notation is used: "2,3 millions de vues" (plural form "many")

longlho commented 4 years ago

I'm trying to figure out the use case for this and so far off the top of my head it'd be useful for debugging. What are your anticipated use cases?

I think right now the confusion, at least for me, primarily comes from implicit fraction/significant digits resolution within NumberFormat, e.g ILD currency digits info that changes the default fraction digits.

The other thing to consider is plural within ICU MessageFormat as well, e.g

{count, plural, one{# book} other{# books}}

With this API seems like the signal is to do NumberFormat.formatSelect to be consistent w/ the rendered output in #. But then if we have

{count, plural, one{book} other{books}}

(no #, so no rendered number), then what should we do in that scenario?

sffc commented 4 years ago

formatSelect does not add any new functionality; it just makes the existing functionality more discoverable, understandable, and efficient. Use cases are not a consideration.

sffc commented 4 years ago

I plan to address this as part of my new proposal Intl.NumberFormat V3.

https://github.com/sffc/proposal-intl-numberformat-v3

zbraniecki commented 4 years ago

@sffc I still don't see any references to libraries or software that would need this feature. It seems quite insufficiently justified so far. Can you provide sources of why and who would need that?

sffc commented 4 years ago

This isn't a feature; it's a refactoring of existing feature. You can refer back to the PluralRules proposal for the full list of use cases.

In message formatting, you generally want both the number and the plural form of the number. Right now you have to use two different Intl classes, which is unintuitive, clunky, and inefficient. (Do you need justification on those three adjectives?) This proposal means you can get both the formatted number and the plural form in one function call, which I claim is more ergonomic and efficient.

zbraniecki commented 4 years ago

Do you need justification on those three adjectives?

I would like to see an example of a library of software where this problem is exemplified.

I am a co-author of a localization system that uses both Intl.PluralRules and Intl.NumberFormat and I have not observed that problem nor do I see how it would apply to my system.

Therefore I'm curious what other cases exist which exemplify the problem you're addressing. Saying "very often engineers encounter..." or "time and time again users are confused..." is only valuable if you can point at examples of where they're confused or where they encountered.

My issue is that I have not seen anything that would validate that claims.

sffc commented 4 years ago

Unintuitive: Previous discussions regarding confusion over Intl.PluralRules behavior: #373, #365, https://github.com/tc39/proposal-unified-intl-numberformat/issues/86. I have also seen users simply unaware that fraction digit settings need to be passed to Intl.PluralRules in order to get correct behavior (which led to issues such as ICU-20617). For example, the following code is incorrect, even in English, but to most non-i18n experts, it looks perfectly plausible:

function howManyStars(locale, count, strings) {
  const nf = new Intl.NumberFormat(locale, {
    minimumFractionDigits: 1,
    maximumFractionDigits: 1,
  });
  const pr = new Intl.PluralRules(locale);
  return `${nf.format(count)} ${strings[pr.select(count)]}`;
}

howManyStars("en-US", 2, { one: "star", other: "stars" });
// Correct: "2.0 stars"

howManyStars("en-US", 1, { one: "star", other: "stars" })
// Incorrect: "1.0 star"

Also, the following doesn't work, either, since trailing zeros are stripped from .select():

const pr = new Intl.PluralRules("en-US");
pr.select("1.0");  // "one", but should be "other"

Clunky and Inefficient: The above function could be re-implemented in a safer, more efficient way by using formatSelect, as follows:

function howManyStars(locale, count, strings) {
  const nf = new Intl.NumberFormat(locale, {
    minimumFractionDigits: 1,
    maximumFractionDigits: 1,
  });
  const result = nf.formatSelect(count);
  return `${result.string} ${strings[result.pluralForm]}`;
}

I see the plural form as being fundamentally tied to the formatted string. In my opinion, as an i18n engineer who has worked with clients trying to implement plural selection correctly, the model of plural selection having its own class that neither accepts nor produces a formatted string is simply wrong, and it leads to bugs such as the ones listed above.


All that said, I appreciate the criticism from other i18n experts in this thread. It could be that my mental model of plural selection isn't correct. I am fine pulling formatSelect from my NumberFormat v3 proposal if we don't have consensus on it.

longlho commented 4 years ago

I agree w/ @zbraniecki. I'm not sure if this is needed as a top level API, but rather just having PluralRules & NumberFormat sharing more underlying abstract operations.

zbraniecki commented 4 years ago

@sffc what would you say for selectPluralCategory method on NumberFormat instead? This way the only surface increase is that NF may be used to get the plural category just like PluralRules can be.

sffc commented 4 years ago

Bikeshed:

  1. Intl.NumberFormat.prototype.formatSelect returning { string, pluralForm }
    • Pros: All features in one place; easy to use correctly; works nicely with formatRange
    • Cons: Doubles number of terminal methods, from 4 to 8 (including formatToParts and formatRange); return value should be a value type, but Records are still only Stage 1
  2. Intl.NumberFormat.prototype.selectPluralCategory returning a string pluralForm
    • Pros: Simple, straightforward addition
    • Cons: Two function calls, reducing potential performance benefit of a single call
  3. Intl.NumberFormat.prototype.getPluralRules returning an Intl.PluralRules
    • Pros: Clean separation of functionality; Intl.PluralRules remains a first-class construction
    • Cons: No performance benefit over the status quo
  4. Intl.PluralRules.from taking an Intl.NumberFormat as an argument
    • Pros/Cons: Same as above
sffc commented 4 years ago

We decided in the 2020-04-23 meeting to table this issue, because none of the proposed options solve the problem completely. We will still require documentation, even if we add new methods. I filed a ticket to follow up on the documentation:

https://github.com/tc39/ecma402-mdn/issues/13