Intl.Locale minimize/maximize extension keys

sffc commented 4 years ago

The ability to get at user preferences is an old, recurring feature request (#6, #38, #68, https://github.com/tc39/proposal-intl-locale/issues/3). It's also a feature that will be important in a Temporal world, when we start to give developers more tools for building custom calendar apps.

I was wondering if we can do this with the maximize() function. Currently, that function only fills in language, script, and region. Is there a reason it can't fill in extension keys, too? It would be slow to fill in everything, but maybe you could request which extension keys you want:

let locale = new Intl.Locale("en");
console.log(locale.calendar);  // undefined
console.log(locale.toString());  // "en"

locale = locale.maximize(["calendar"]);
console.log(locale.calendar);  // "gregory"
console.log(locale.toString());  // "en-Latn-US-u-ca-gregory"

locale = locale.minimize();
console.log(locale.calendar);  // "gregory"
console.log(locale.toString());  // "en-u-ca-gregory"

locale = locale.minimize(["calendar"]);
console.log(locale.calendar);  // undefined
console.log(locale.toString());  // "en"

The pattern to get the default calendar in a browser environment then becomes:

function getDefaultCalendarV1() {
    // TODO: How do we pick the best language tag out of navigator.languages?
    let locale = new Intl.Locale(navigator.language);
    if (!locale.calendar) {
        locale = locale.maximize(["calendar"]);
    }
    return locale.calendar;
}

Slightly shorter, but could cause extra work to maximize subtags you don't need:

function getDefaultCalendarV2() {
    return new Intl.Locale(navigator.language).maximize(["calendar"]).calendar;
}

Maybe in addition, we can add a flag to populate this in the option bag:

function getDefaultCalendarV3() {
    return new Intl.Locale(navigator.language, { calendar: true }).calendar;
}

@zbraniecki @littledan

anba commented 4 years ago

Dup of #390?

sffc commented 4 years ago

Dup of #390?

Indeed; this issue has more activity though so I'll close #390 and point it here. Thanks!

sffc commented 4 years ago

I think it's safe to say that this issue is pending a resolution to #416.

littledan commented 4 years ago

Do you have a use case in mind that wouldn't be met by an idiom like new Intl.DateTimeFormat("en").resolvedOptions().calendar? Or is this idiom considered too messy?

sffc commented 4 years ago

Do you have a use case in mind that wouldn't be met by an idiom like new Intl.DateTimeFormat("en").resolvedOptions().calendar? Or is this idiom considered too messy?

How do you propose handling user preferences that aren't covered by any particular Intl API, such as the first day of the week (among others; that's just one example)?

littledan commented 4 years ago

@sffc Ah, good question. Maybe we should/could add these to resolvedOptions? (I'm not opposed to the proposal in the OP, just making suggestions.)

littledan commented 4 years ago

Oh, heh, I rejected the resolvedOptions approach in https://github.com/tc39/ecma402/issues/6#issuecomment-273331899 years ago :)

littledan commented 4 years ago

A suggestion: Rather than framing this necessarily in terms of extension keys (which would only be able to explain a fraction of what's in CLDR), what if we added methods to Intl.Locale to get the preferred calendar, first day of the week, etc? We could add individual methods for each piece of data that we want to make accessible.

sffc commented 4 years ago

A suggestion: Rather than framing this necessarily in terms of extension keys (which would only be able to explain a fraction of what's in CLDR), what if we added methods to Intl.Locale to get the preferred calendar, first day of the week, etc? We could add individual methods for each piece of data that we want to make accessible.

I think the key question here is: how do we agree on the set of preferences that we want to make accessible? @litherum brought up an example of a hypothetical user preference, whether you prefer cats or dogs. Does TC39-TG2 want to own the decision of making a value judgement on whether a certain setting is worth adding?

Unicode has already gone through the process of vetting the set of user preferences to include in the form of extension tags, and that schema is well-specified and understood around the industry. Why reinvent the wheel when Unicode already went through the trouble of solving this problem for us?

littledan commented 4 years ago

How does this relate to user preferences? I thought the minimize/maximize API was about querying standard locale data, and we'd cordon off all user preferences to navigator.locale or other APIs.

littledan commented 4 years ago

Unicode has already gone through the process of vetting the set of user preferences to include in the form of extension tags, and that schema is well-specified and understood around the industry. Why reinvent the wheel when Unicode already went through the trouble of solving this problem for us?

My understanding from a discussion with @aphillips was that the BCP 47 tags don't form a list of all the user preferences we might care about, and it might not make sense to extend them to this. (I'm not enough of an expert in this area to evaluate that myself, though, and it's possible I misunderstood him.)

I'm also wondering if there's further data that CLDR has about locales that we might want to expose, which isn't even considered a "user preference".

sffc commented 4 years ago

BCP 47 tags don't form a list of all the user preferences we might care about, and it might not make sense to extend them to this

That's one of the problems: how do we go about deciding what user preferences to include?

The list @zbraniecki posted in https://github.com/tc39/ecma402/issues/6#issuecomment-153934594 is the closest I've seen of a cohesive list of exactly what preferences we want:

firstDayOfTheWeek
- Available: -u-fw-
weekendStarts, weekendEnds
- Not currently available, but CLDR would probably add this
direction (ltr/rtl)
- N/A: Not a user preference
calendar type
- Available: -u-ca-
ordered list of currencies
- Available as -u-cu-, and CLDR would probably be flexible allowing a list as well
ordered list of timezones
- Same as above: -u-tz-
ordered list of scripts
- N/A: Not really a user preference

Basically, the policy I would propose is that we build on top of the closest thing there is to an industry standard on user preferences, and defer to that spec for not only the list of preferences, but also the schema and identifiers associated with them. If we standardize a list of valid user preferences (and the choices for those preferences) in ECMA-402, then we become the de-facto standard that others might cite, and this isn't something that I truly think should be out of scope for us.

If there's a setting we think is legitimate and worthwhile including as a Locale extension keyword, we bring that to CLDR and let them decide. I think in most cases, if the request is legitimate, they will be happy to accept it.

I'm also wondering if there's further data that CLDR has about locales that we might want to expose, which isn't even considered a "user preference".

Yes. Here are some examples:

Official languages for a region
Population of a region
Region containment graph
Exemplar characters for a language
Script direction (rtl/ltr)

I see these as fundamentally different from user preferences. First, there is usually only one right answer at a given point in time. Second, these preferences are often represented by more complex structures, like a set of Unicode code points (exemplar characters) or a DAG (region containment), not a string enumeration like is the case with user preferences. Third, these are more similar to Unicode properties (https://github.com/srl295/es-unicode-properties) and Display Names (https://github.com/tc39/proposal-intl-displaynames) than user preferences.

I think if we wanted to add APIs to get at any of this supplemental CLDR data, we could add them on a case-by-case basis, and that decision doesn't influence what we do with user preferences.

littledan commented 4 years ago

I'm still confused about where these are supposed to come from. Are they implied by the locale? Are they preferences indicated in the OS? A mix of them? I'm in favor of adding APIs for both needs (exposing locale data and exposing OS/user preferences).

I'm proposing that we use different APIs for them (e.g., navigator.locales for OS preferences and something like an Intl.Locale method or expanded resolvedOptions for locale-provided data).

Can we agree that any OS/user preferences aren't exposed from calls to the Intl.Locale constructor itself but rather sorted into a special API like navigator.locales (which would be an Array of Intl.Locales, and the only way to get something populated with user/OS preferences)? Or are these deeply intermingled in some way I can't understand?

littledan commented 4 years ago

Note that the separation of navigator.locales (for getting current user/OS preferences) from Intl.Locale (for representing, querying and manipulating preferences) is parallel to the separation of Temporal.now from the classes in Temporal. This helps permit mocking, and use across different contexts (e.g., the server side where it's inappropriate to use ambient global locales or timezones floating around in classes that you need to use to manipulate things).

This is why I'm opposed to some kind of Intl.Locale.prototype.maximize method as a way of getting access to user/OS preferences, just as I opposed new Intl.DateTime("en", { dateStyle: "short" }) as a way to opt into settings-specific patterns.

sffc commented 4 years ago

I'm not intending to conflate Intl.Locale.prototype.maximize with navigator.locales. I also like the clear separation of environment-dependent user preferences, and I would also oppose any additions that made Intl objects more environment-dependent.

What I'm proposing is that if there are any user preferences, they show up in navigator.locales. Intl.Locale.prototype.maximize should only load defaults from locale data, similar to how it already loads defaults from locale data for the script and region subtags.

In other words, in a world with both of these features, if you wanted to get the user's preferred calendar, the fundamental operations (which we could sugar up if necessary) would be:

// Step 1: get the user's locale (TODO: pick one based on your site's l10n support)
let contentLocale = navigator.locales[0];

// Step 2: populate the `-u-ca-` subtag if it isn't already present
contentLocale = contentLocale.maximize(["calendar"]);

// Step 3: get the calendar identifier
return contentLocale.calendar;

There are two cases here:

Case 1: navigator.locales[0] has a -u-ca- keyword. In this case, .maximize(["calendar"]) is a no-op, and the value of that keyword gets returned on the third line.

Case 2: navicator.locales[0] does not have a -u-ca- keyword. In this case, .maximize(["calendar"]) queries CLDR to look up what the default calendar should be, based on the other subtags like language and region, but not based on other environment-dependent information.

Does that help clarify?

littledan commented 4 years ago

Yes, seems like our understanding of the separation between Intl.Locale and navigator.locales coincides; thanks for bearing with me.

Now, to bikeshed: do we want to have a general .maximize(attributes) method taking an array of these attributes. Another design would be to have methods to look this up, such as get Intl.Locale.prototype.preferredCalendar, which would get the calendar if present, and if absent get it from the locale data.

I think getters for each thing that we want to maximize would be more ergonomic. It would also permit feature testing (though arguably that's not a requirement, as this could be considered a sort of a best effort API). Are there any reasons, ergonomic or otherwise, to prefer the array-of-properties approach to getters per thing to query?

(These getters would also leave the door open for exposing data that is not expressed in BCP 47, as I mentioned in https://github.com/tc39/ecma402/issues/409#issuecomment-622442748, though I think the initial set of things to include would be from BCP 47 (a subset of things that are in BCP 47 seems highest priority), and it's not clear whether we'd ever expand outside of that set.)

sffc commented 4 years ago

Another design would be to have methods to look this up, such as get Intl.Locale.prototype.preferredCalendar, which would get the calendar if present, and if absent get it from the locale data.

That does indeed have nicer ergonomics:

return navigator.locales[0].preferredCalendar;

If you wanted to save the result for later, you could do,

let maximized = navigator.locales[0];
maximized.calendar = maximized.preferredCalendar;

Are there any reasons, ergonomic or otherwise, to prefer the array-of-properties approach to getters per thing to query?

The .maximize() function already accesses locale data as part of its contract. I was thinking that a polyfill or other library basing itself off 402 could choose to make .maximize() be async in order to load the data from a file or service somewhere. It would be weird for getters to be data-dependent.

(These getters would also leave the door open for exposing data that is not expressed in BCP 47, as I mentioned in #409 (comment), though I think the initial set of things to include would be from BCP 47 (a subset of things that are in BCP 47 seems highest priority), and it's not clear whether we'd ever expand outside of that set.)

I'm not sure if Intl.Locale is the right place to think about adding these other getters, but we can talk about that in the future.

littledan commented 4 years ago

maximized.calendar = maximized.preferredCalendar;

Locales are immutable; I guess we'd do something like this:

maximized = new Intl.Locale(maximized, {calendar: maximized.preferredCalendar})

I hope that that's good enough ergonomics, but if you have ideas for improving it, let's discuss them.

I was thinking that a polyfill or other library basing itself off 402 could choose to make .maximize() be async in order to load the data from a file or service somewhere.

This topic of making things async seems really important, but I see it as a large and complicated separate issue. I think the data for this will be rather small, but supporting a larger set of locales, or strings in Intl.DisplayNames than would be practical to ship in engines (Emoji descriptions anyone??) could be really useful. Maybe we should discuss this separately.

I'm not sure if Intl.Locale is the right place to think about adding these other getters, but we can talk about that in the future.

Yes, I share this uncertainty, and agree that we can put off this discussion. Actually, this uncertainty is a big reason that we've held off on navigator.locales so far. But, at this point, it seems like the highest priority things do fit into BCP 47 and Intl.Locale, so seems like it does make sense to go ahead with this organization.

littledan commented 4 years ago

Let's follow up on async Intl APIs in https://github.com/tc39/ecma402/issues/434 .

sffc commented 4 years ago

How about,

navigator.locales[0].getLikelyCalendar()
navigator.locales[0].getLikelyNumberingSystem()
navigator.locales[0].getLikelyHourCycle()

where

Intl.Locale.prototype.getLikelyCalendar = function() {
  if (this.calendar) {
    return this.calendar;
  } else {
    // return best-guess calendar from locale data
  }
}

littledan commented 4 years ago

@sffc I'm with you on semantics, but why get methods, and not getters, like navigator.locales[0].likelyCalendar? Also, I'm wondering whether we should call these methods/getters "likely" or "preferred" or something else.

sffc commented 4 years ago

@sffc I'm with you on semantics, but why get methods, and not getters, like navigator.locales[0].likelyCalendar? Also, I'm wondering whether we should call these methods/getters "likely" or "preferred" or something else.

Because of what I said earlier about these methods being more "heavyweight" and accessing locale data. It's weird for a getter to do potentially a lot of work.

littledan commented 4 years ago

Oh sorry, I see you wrote:

It would be weird for getters to be data-dependent.

I think some data-dependency is fine here. These are pretty small data tables. In general, I think these getters qualify as "acting as data properties", as described in the W3C design principles doc. We're not necessarily bound by that doc in TC39, but it's a nice point of reference IMO.

Anyway, I think accessor vs method is a small bikeshedding question that we could resolve between Stage 2 and 3 (or the equivalent for this as a PR).

Seems like we're zeroing in on an API here. Should we pursue this as a staged proposal, or a PR? Does someone want to be a "champion" for this effort, writing a document with the consolidated motivation and spec text?

tc39 / ecma402

Intl.Locale minimize/maximize extension keys #409