tc39 / ecma402

Status, process, and documents for ECMA 402
https://tc39.es/ecma402/
Other
541 stars 107 forks source link

Why is there no `Intl.Locale.prototype.variants`? #900

Open nnmrts opened 5 months ago

nnmrts commented 5 months ago

Why is there no Intl.Locale.prototype.variants? There are getters for language, region and script but I saw no information about the reason variants is missing or shouldn't be there as well.

sffc commented 4 months ago

@zbraniecki Thoughts on this?

sffc commented 1 week ago

TG2 discussion: https://github.com/tc39/ecma402/blob/main/meetings/notes-2024-11-25.md#why-is-there-no-intllocaleprototypevariants-900

There were questions about motivation (most use cases for variants are better served by a corresponding Unicode extension keyword), as well as the shape of this getter (does it return a list? is the list sorted? or does it return a string with multiple subtags?)

nnmrts commented 1 week ago

I see, thanks for having the discussion!

Since when is the variants part in CLDR "deprecated" though?

I sadly can't remember my exact use-case and should have included it in my original post, but I think it was about two things:

The latter got added to the IANA language subtag registry in March this year. I know that isn't CLDR, but I was under the impression that this file is the "source of truth" for registered language subtags used in CLDR and everything else.

I also don't see any kind of "deprecation" of variants here: https://www.unicode.org/reports/tr35

Regarding the type of an eventual variants, I don't see any issue with using an array or even a set here.

I don't know what

The Japanese one from one to two, it’s complicated

is referencing and I don't see the difficulty of parsing variants. They can only ever be 5-8 long alphanumeric strings and they can only be followed by extensions and private use tags, so what's wrong with .split("-")? 😆

While we're at it, I don't see a reason why extensions and private-uses also aren't getters, but I guess that's a different story.

sffc commented 6 days ago

A little more context: I think people on the call were referring to variants as "legacy" or "deprecated" because of the following issues:

  1. Some variants are better represented as extension keywords.
  2. LDML says that variants are supposed to be in alphabetical order, which doesn't make sense with certain IANA subtags
    • Example: "sl-IT-rozaj-biske-1994" would be canonicalized to something like "sl-IT-1994-biske-rozaj" even though the IANA subtag registry says it should be "rozaj-biske"

In other words, the comments from the discussion were based on the point of view that variants are basically a grab bag of things that would be better expressed as more specific locale extensions.

Personally, I still think variants are motivated because they remain the standard way of expressing orthographies. Something like el-polyton is a good, modern language identifier that I don't believe has another representation with extension keywords.

Regarding the type of an eventual variants, I don't see any issue with using an array or even a set here.

Returning an ECMAScript Set is an interesting proposition since it avoids the point of contention on whose ordering to use (IANA's or Unicode's).

gibson042 commented 16 hours ago

Returning an ECMAScript Set is an interesting proposition since it avoids the point of contention on whose ordering to use (IANA's or Unicode's).

I don't think it would, because ECMAScript Set instances are deterministically ordered.

nnmrts commented 2 hours ago

Yeah, I mainly suggested the Set because it follows the other rule of variants (or any "multi-tag"): uniqueness. But honestly, for most users it would probably be unexpected to get a Set here in comparison to the rest of JS.

nnmrts commented 2 hours ago

Regarding the ordering of variants: I don't really think the array or set needs to be ordered in any specific way other than "the same as supplied". Both CLDR and IANA, if I understand correctly, just define a recommended or canonical way to order them in the context of language subtags, not in the context of JavaScript arrays. And AFAIK implementers need to be able to understand any ordering.

One could even argue that it's more expected if the ordering is the same as the user specified it, even if it's "wrong".

So in general, the ordering, of all things, shouldn't be a the blocker here.