Closed FrankYFTang closed 3 years ago
My intuitive response is that if we are returning ordered list of preferences, then we should not restrict that list, but extend and order based on the signals. The only signal we get from U extension is the "most preferred" and in result it should be in the front, as it's the strongest most specific signal we have. The second is the list of ordered calendars from the lang/reg pair and it should follow.
The only signal we get from U extension is the "most preferred"
that is the question- is it represent "most prefer" or "the only ONE preferred"? is that really true it represent"the most preferred" instead of "THE only preferred"? That is the question. if you look at the ar-SA vs ar case (I am not talking about ar-u-ca-persian vs ar-SA-u-ca-persian, I am talking about ar vs ar-SA) you will see where I came from notice ar has ["gregory", "coptic", "islamic", "islamic-civil", "islamic-tbla"] but ar-SA has ["islamic-umalqura", "gregory", "islamic", "islamic-rgsa"] but NOT ["islamic-umalqura", "gregory", "coptic", "islamic", "islamic-rgsa"]
and "coptic" is not in that list so ar-SA mean the user "ONLY PREFER" what used in Arabic in Saudi Arabic not "most prefer" what used in Arabic in Saudi Arabic therefore "coptic" is now not in that list, right? If it carry a "most prefer' semantic, then how could we justify the removal of "coptic"?
I don't believe that there's ever "the only one preferred" for two reasons:
1) No instrument in Unicode Locale is intended for listing exclusion lists 2) No fallback can ever really be a single-item. It's an illusion. If you may need a fallback, and you carry a single-item, then you implicitly build a two-item fallback list of ["the-item", "last-fallback"]. Once you accept it, then all we can do is improve from that two-item list so that there's something to fallback on before the last-fallback. In this case I believe that the list of items that are not "the-item" and not "last-fallback" in the middle improves the user experience.
so... now... look at the hc case en is ["h12"] en-GB is ["h23"] what should en-GB-u-hc-h11 return? if I follow what you said, it should be ["h11", "h23"] right? But why it shouldn't be ["h11", "h23", "h12"] if I follow your logic?
- No instrument in Unicode Locale is intended for listing exclusion lists
Agree, but they all carry "more specific" information, right? when it express more specific information, it should "narrow down" the possible choices, right?
- No fallback can ever really be a single-item
Sorry, I do not understand the relationship between "fallback" and the issue we are discussing here. Could you express your link explicitly?
But why it shouldn't be ["h11", "h23", "h12"]
I see no reason why it shouldn't be.
Agree, but they all carry "more specific" information, right? when it express more specific information, it should "narrow down" the possible choices, right?
I think it should set the preference.
Sorry, I do not understand the relationship between "fallback" and the issue we are discussing here. Could you express your link explicitly?
Since Intl API design is inherently "best effort", it means that we roughly collect all the possible signals, prioritize them, and come back with the best possible answer. That always involves fallbacks because there's always a "what if the best option is no available". In most cases the "what if" is hypothetical (what if we don't have western arabic numerals? if we don't have them, we have a much bigger problem that this single format output ;)), but in most of the long tail, this is a reasonable problem - user tells us that they like "h23", and they use a locale that prefers "h12", cool, let's try "h23", and fallback on "h12" if the former "doesn't work" for some reason. And if those two don't work, then we can either randomly select one that works, or try to prioritize it - we can prioritize it by knowing the parent locale of the locale the user gave us, or the second best locale that the user gave us, if they gave us a fallback chain, and maybe this one has some value that we can support, before we fallback on "last resort fallback", The "hourCycle" in my narrative is just an example and applies to every setting and data key in Intl as far as I can see.
What I'm saying is that any single value is not guaranteed. You ask for "en-US"? Maybe we have it, maybe not You ask for "u-hc-h12"? Maybe we have it, maybe not. You ask for "gregorian"? Maybe we have it maybe not. The only real question is what we'll do if we don't.
And my take is that we always end up encoding "last resort" fallback and the whole API design game is trying if we can interpret a signal that will allow us to fallback on something better than "last resort".
So, to loop it back to your first post: ar-SA-u-ca-persian
- user gave us an explicit signal that they want persian
calendar. So that's the strongest signal we have and i result it should be top of the ordered list.
What then? If you returned ["persian"]
and we don't have persian
, then what happens next? Well, the only thing that can happen is that the system will jump directly to "last resort" which in case of ICU is likely und
and gregorian
.
But is it possible that we can do better? Well, yes we can! The user also told us that they want ar-SA
so maybe we can use that signal to give them a fallback that is better than last resort!
So, here it is: ["persian", "islamic-umalqura", "gregory", "islamic", "islamic-rgsa"]
.
You just provided a better fallback scenario than you would allow for if the return list was ["persian"]
.
But is it possible that we can do better? Well, yes we can! The user also told us that they want ar-SA so maybe we can use that signal to give them a fallback that is better than last resort! So, here it is: ["persian", "islamic-umalqura", "gregory", "islamic", "islamic-rgsa"].
but then you can argue user also tell use they want "ar", and "coptic" is also used in "ar", then should we add "coptic" to that list too? even "coptic" is not part of ar-SA?
but then you can argue user also tell use they want "ar", and "coptic" is also used in "ar", then should we add "coptic" to that list too? even "coptic" is not part of ar-SA?
Soo, I would :) But I believe that it's a part of the game and I believe others may have other, valid opinions because eventually, if you drag my position ad extremum, you get to "take all possible values and order them". So the sweet spot we're looking for is between getting as many ordered meaningful values as possible but not including meaningless.
What is meaningful and meaningless is subjective and hard to evaluate (the "weights" in language negotiation model is an attempt that I don't think works as well as it was intended to).
For coptic
, you get diminishing returns (kind of like EMA, exponentially) the further down the chain you go - first value is critical, first fallback after it is very important, second much less, third very little, and so on, until "last fallback", which is very important again, because it is a catch all for the long tail.
So my take would be that added complexity of ordering from parent locale chain is likely not valuable, but I can see an argument that my position is subjective and arbitrary and I agree with it :)
The bottom line is that I'd love to underspec it and allow for "best effort" to produce such list and allow us to iteratively improve it over time as the fallback logic improves.
At first, I was in the camp that we should append the preference to the front of the list, but now, I feel that we should return an array of length 1 if the field is explicitly specified in the locale. Here's why:
ar-u-ca-coptic
, given that "coptic"
is already in the list of likely calendars? Do we reorder the list and pull "coptic"
to the front? Do we have two entries for it?Do we reorder the list and pull "coptic" to the front?
Reorder. It should really be a Set
.
Do we reorder the list and pull "coptic" to the front?
Reorder. It should really be a
Set
.
I am very against to treat them as a Set
, right now is an order list and that is what it really is.
I am very against to treat them as a Set, right now is an order list and that is what it really is.
I apologize. I meant a set as "deduplicated", not "unordered". I believe it should be ordered.
Another reason I dislike the "adding" somatic is that assume the user try to communicate "I also know X and I prefer X" instead of "I only know X but nothing else" . For example, ar-SA has ["islamic-umalqura", "gregory", "islamic", "islamic-rgsa"]
How can an users express "I only know about 'gregory' calendar and I know nothing about "islamic-umalqura", "islamic", "islamic-rgsa" calendars and please don't bother to show them to me?
How can an users express "I only know about 'gregory' calendar and I know nothing about "islamic-umalqura", "islamic", "islamic-rgsa" calendars and please don't bother to show them to me?
I'll repeat my position from https://github.com/tc39/proposal-intl-locale-info/issues/12#issuecomment-799648371
1) Unicode does not have exclude-lists. There is no way "I can't read h12" or "I don't understand german". You only say "I prefer X" or "I prefer Y and if not Y, then Z". All signals in Unicode Locale are positive signals that are meant to bring more value to the ordering of the fallback.
But, ultimately, and much more importantly, such exclusion would give us nothing.
In a scenario of a fallback of [A, B, C]
the only case in which B
matters is if we don't have A
.
And if we don't have A
, and the user told us that they only understand A
then we still have to fallback, and in the absence of any better signal we'll fallback on last resort
.
Which brings me back to the notion that all we're doing is trying to improve over the last resort. And in general deriving the calendar system from the communicated to us locale is imho much better than falling back on last resort.
To rephrase again my position, the idea that you can return a single element and somehow that single element is meaningful on its own is an illusion because in practice I have never seen any other behavior of the consumer of such input than remodelling [A]
into [A, last_resort]
and falling back through the ordered list. So every scenario in which you can improve upon it by doing [A, B]
to allow for [A, B, last_resort]
, you are improving the UX in error scenario.
By my recollection, it is in scope for UTS 35 locale identifiers to eventually add support for multiple preferences if it doesn't support it already. For example, in the future, something like en-US-u-ca-hebrew-ca-gregory
could mean that your preference list is ["hebrew", "gregory"]
. But as of now, UTS 35 supports only one preference at a time, and I think we should reflect that. We should work with Unicode to add multi-preferences (as well as improvements to things like the measurement system preference) to UTS 35.
[[[ END OF APPROACH 1]]]
[[[ BEGIN OF APPROACH 2]]]
[[[ END OF APPROACH 2]]]
So in the next ECMA402 we can just decide which version we like to take and maybe do a simple name changes.
@zbraniecki - are you willing to serve as one of the two Stage 3 reviewers? #9 so I can close down that issue?
@zbraniecki - are you willing to serve as one of the two Stage 3 reviewers? #9 so I can close down that issue?
I'm concerned about my capacity to pick it up at the moment. I may end up slowing you down :(
Maybe if you can't find anyone else in a couple weeks I can serve as a fallback reviewer?
During the 2021-04-08 ECMA402 meeting, we have decided to go with the shorter output which mean returning the restricted version.
We need to decide the semantic of u extension toward default
Let's let me use zh locale to explain
zh Locale has ["gregory", "chinese"] as commonly used calendars zh-TW has ["gregory", "roc", "chinese"] as commonly used calendars now, what is the commonly used calendar for "zh-TW-u-ca-japanese" locale? There are two possible answers A. ["japanese", "gregory", "roc", "chinese"] B. ["japanese']
Answer B mean since zh-TW-u-ca-japanese mean Chinese in Taiwan using Japanese calendar system, the commonly used calendar is only Japanese, but not including [ "gregory", "roc", "chinese"] since the ca-japanese already restrict the calendar to japanese in locale
Another example to consider ar has ["gregory", "coptic", "islamic", "islamic-civil", "islamic-tbla"] as commonly used calendars ar-SA has ["islamic-umalqura", "gregory", "islamic", "islamic-rgsa"] as commonly used calendars ar-EG has ["gregory", "coptic", "islamic", "islamic-civil", "islamic-tbla"] as commonly used calendars
now, what should be the defaults.calendars for ar-u-ca-persian, ar-SA-u-ca-persian, ar-EG-u-ca-persian ? A (new Intl.Locale("ar-u-ca-persian")).defaults.calendars return ["persian", "gregory", "coptic", "islamic", "islamic-civil", "islamic-tbla"]
(new Intl.Locale("ar-SA-u-ca-persian")).defaults.calendars return ["persian", "islamic-umalqura", "gregory", "islamic", "islamic-rgsa"]
(new Intl.Locale("ar-EG-u-ca-persian")).defaults.calendars return ["persian", "gregory", "coptic", "islamic", "islamic-civil", "islamic-tbla"]
B (new Intl.Locale("ar-u-ca-persian")).defaults.calendars return ["persian"]
(new Intl.Locale("ar-SA-u-ca-persian")).defaults.calendars return ["persian"]
(new Intl.Locale("ar-EG-u-ca-persian")).defaults.calendars return ["persian"]
@sffc