tc39 / proposal-intl-locale-info

An API to expose information of locale, such as week data ( first day of a week, weekend start, weekend end), hour cycle, measurement system, commonly used calendar, etc.
MIT License
56 stars 11 forks source link

Define if "ca" Unicode extensions have an effect on Intl.Locale.prototype.getWeekInfo() #30

Open anba opened 3 years ago

anba commented 3 years ago

When passing a language tag which includes the "ca" Unicode extension to ICU, the calendar is taken into account when retrieving the week-info attributes from ICU's UCalendar. Or more concretely, if and only if the calendar is "iso8601", it's taken into account.

Because this makes only a difference when the calendar is "iso8601" and always only modifies firstDay to Monday and minimalDays to 4 days, I don't think providing this feature provides any real value and we should make it clear in the specification that "ca" Unicode extensions shouldn't be taken into account.

Example when "iso8601" is passed to ICU:

js> new Intl.Locale("en-US").weekInfo
({firstDay:7, weekendStart:6, weekendEnd:7, minimalDays:1})
js> new Intl.Locale("en-US-u-ca-iso8601").weekInfo
({firstDay:1, weekendStart:6, weekendEnd:7, minimalDays:4})
FrankYFTang commented 3 years ago

we should make it clear in the specification that "ca" Unicode extensions shouldn't be taken into account. I disagree with this. What you state could be true now in the ICU implementation. But we may need to denote two differeent locales , which both are Arabic locale for a particular region, but follow two different calendar, and their weekend may need to be different in the future.

I think we should keep the spec text AS IS so one day we may be able to provide such information from the locale data.

@sffc @ryzokuken @zbraniecki - what do you think?

FrankYFTang commented 3 years ago

I think the current spec text does not rule out the impact of any Unicode extension so it is clear any unicode extension may have impact to the weekInfo- I think this is by design. For example, some subdivision may observe different weekend . The fact CLDR and ICU currently does not havce those data should not be the reason we restrict the spec to allow that to happen one day. As you point out some state/province in some country observe different weekend practice and that would be needed to come from rg or sd unicode extension, or come from the ca extension one day.

anba commented 3 years ago

If any Unicode extension (the relevant ones are: "ca", "rg", "sd", or "fw") should have an effect, we should explicitly mention that instead of leaving this unspecified, because that way we can more easily guarantee compatible behaviour across different implementations.

Is the "u-ca-iso8601" special case something specific to ICU, because I don't see it mentioned anywhere in https://unicode.org/reports/tr35/tr35-dates.html#Week_Data? Are there any other calendars where a user would expect different week-info data when a calendar is explicitly set through "u-ca"? For example if "en-US-u-ca-iso8601" changes firstDay from Sunday to Monday, why should "en-US-u-ca-persian" not also change firstDay to Saturday, given that Saturday is the first day of the week in the solar Hijri calendar?

anba commented 3 years ago

Reading the relevant ICU source code to check for additional issues, I've found that ICU respects the rg Unicode extension to override the region. That can lead to user confusion, because ECMA-402 doesn't yet support rg.

For example:

> new Intl.DateTimeFormat("en-u-rg-afzzzz").resolvedOptions().calendar
"gregory"
> new Intl.Locale("en-US-u-rg-afzzzz").weekInfo
({firstDay:6, weekendStart:4, weekendEnd:5, minimalDays:1})

Intl.DateTimeFormat ignores rg and instead uses the default calendar for en-US. If the user now wants to find additional information about the calendar for that locale via weekInfo, the calendar data for the solar Hijri calendar (= default calendar for Afghanistan) are returned.


Week info data isn't really tied to a language, but instead is typically derived from a region. Also see Calendar::setWeekData and ulocimp_getRegionForSupplementalData in ICU. This makes the spec proposal somewhat inconsistent, because TimeZonesOfLocale() requires that a region subtag is present, whereas WeekInfoOfLocale() is spec'ed to work even when no region subtag is present. The decision when a region subtag must be present or not is tied to API requirements from ICU, so IOW the ICU API leaks through and defines how things got spec'ed. This doesn't seem right to me.


Should I create a separate issues for these two problems?

sffc commented 2 years ago

Discussion from 2021-09-09 TC39-TG2: https://github.com/tc39/ecma402/blob/master/meetings/notes-2021-09-09.md#define-if-ca-unicode-extensions-have-an-effect-on-intllocaleprototypeweekinfo

We did not reach a hard conclusion, except to establish that we do feel it is important to take subtags such as -u-ca into account when determining the week info. Frank is to follow up with Anba on this issue to resolve the remaining open questions.

FrankYFTang commented 2 years ago

why should "en-US-u-ca-persian" not also change firstDay to Saturday, given that Saturday is the first day of the week in the solar Hijri calendar?

I do not believe the specification say it should NOT also change it, and that is the point, the specification should NOT say it won't ever change it.

If ICU does not do so, or an implementation of this spec does not do so, or the CLDR include the wrong information or lack of information to implement the correct result, then those issue should be filed to ICU, CLDR and/or browser bug tracking system, instead of restrict this specification to do the right thing.

FrankYFTang commented 2 years ago

Week info data isn't really tied to a language, but instead is typically derived from a region.

agree - but why is this related to "u-ca"?

anba commented 2 years ago

Week info data isn't really tied to a language, but instead is typically derived from a region.

agree - but why is this related to "u-ca"?

It's not directly related to "u-ca", but instead about the ICU implementation, which in turn is related to "u-ca". And that's why I asked if the questions/issues around the region subtag and "u-rg" should be tracked in a different issue. :smile:

Should I create a separate issues for these two problems?

anba commented 2 years ago

We did not reach a hard conclusion, except to establish that we do feel it is important to take subtags such as -u-ca into account when determining the week info.

As mentioned above, any relevant Unicode extension keys should be explicitly documented.

Additionally, if Unicode extension keys have to be taken into account for any of CalendarsOfLocale(), CollationsOfLocale(), HourCyclesOfLocale(), or NumberingSystemsOfLocale(), it should also be explicitly mentioned.

anba commented 2 years ago

I do not believe the specification say it should NOT also change it, and that is the point, the specification should NOT say it won't ever change it.

But neither does the specification say that it should change anything.

If ICU does not do so, or an implementation of this spec does not do so, or the CLDR include the wrong information or lack of information to implement the correct result, [...]

But what exactly is the "correct result"? Can you point to a specification which describes this "correct result"?

Louis-Aime commented 2 years ago

IMHO, from the user's point of view, Intl.Locale.prototype.weekInfo has to take the "ca" Unicode extension into account. weekInfo shall be different for "en-US-ca-gregory" (the default for "en-US") and "en-US-ca-iso8601" which should stick to the ISO 8601 standard. A similar behavior should occur for "en-US-u-ca-persian" since the Persian calendar does specify that the first day of the week is Saturday: this should be reflected through weekInfo.

FrankYFTang commented 2 years ago

But what exactly is the "correct result"? Can you point to a specification which describes this "correct result"?

https://tc39.es/proposal-intl-locale-info/#sec-week-info-of-locale

Let locale be loc.[[Locale]].
Assert: locale matches the unicode_locale_id production.
Return a record whose fields are defined by Table 1, with values based on locale.

in UTS35 https://unicode.org/reports/tr35/#Unicode_locale_identifier unicode_locale_id | = unicode_language_id  extensions*  pu_extensions? ;

which include the extensions "Return a record whose fields are defined by Table 1, with values based on locale."

The weekday value indicating which day of the week is considered the 'first' day, for calendar purposes.

FrankYFTang commented 2 years ago

As mentioned above, any relevant Unicode extension keys should be explicitly documented.

It is ALREADY documented for Intl.Locale object https://tc39.es/ecma402/#sec-intl.locale-internal-slots "The value of the [[RelevantExtensionKeys]] internal slot is « "ca", "co", "hc", "kf", "kn", "nu" ». If %Collator%.[[RelevantExtensionKeys]] does not contain "kf", then remove "kf" from %Locale%.[[RelevantExtensionKeys]]. If %Collator%.[[RelevantExtensionKeys]] does not contain "kn", then remove "kn" from %Locale%.[[RelevantExtensionKeys]]."

Since these weekInfo is a method on Intl.Locale the RelevantExtensionKeys is already covered in the construction of the Intl.Locale object.

For example, let's look at other pre-existing function in Intl.DateTimeFormat The formatToParts method will behave differently depends on "ca" extension, right? But that is only mentioned in the Intl.DateTimeFormat constructor, but NOT in each method.

Spec text in https://tc39.es/ecma402/#sec-Intl.DateTimeFormat.prototype.formatToParts does not mention it behave different based on "ca" extension, but instead, only on the internal slot of the Intl.DateTimeFormat.

In the same way, Intl.Locale.prototype.weekInfo behave only on the internal slot of Locale, which is loc.[[Locale]]

dminor commented 1 year ago

I think we're still looking for some clarification on this issue, see this comment on the Firefox bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1693576#c27

dminor commented 1 year ago

@FrankYFTang @anba Is this issue still relevant? This proposal is up for discussion in next week's TC39 plenary and it would be great to sort this out before the meeting rather than discuss it in committee again.

sffc commented 1 year ago

Note that the NumberFormat spec text explicitly states when items are "numbering system dependent" (-u-nu).

The situation here is slightly different since we're operating on a whole Intl.Locale object

FrankYFTang commented 1 year ago

Note that the NumberFormat spec text explicitly states when items are "numbering system dependent" (-u-ns).

do you mean "-u-nu" (instead of "-u-ns"?)

sffc commented 1 year ago

Note that the NumberFormat spec text explicitly states when items are "numbering system dependent" (-u-ns).

do you mean "-u-nu" (instead of "-u-ns"?)

Yes

anba commented 1 year ago

As mentioned above, any relevant Unicode extension keys should be explicitly documented.

It is ALREADY documented for Intl.Locale object https://tc39.es/ecma402/#sec-intl.locale-internal-slots [...]

For example, let's look at other pre-existing function in Intl.DateTimeFormat The formatToParts method will behave differently depends on "ca" extension, right? But that is only mentioned in the Intl.DateTimeFormat constructor, but NOT in each method.

[...]

These are different [[RelevantExtensionKeys]] slots with different semantics. Intl.DateTimeFormat is an Intl service constructor, so the [[RelevantExtensionKeys]] definition from https://tc39.es/ecma402/#sec-internal-slots is used. Intl.Locale isn't an Intl service constructor, so only the definition from https://tc39.es/ecma402/#sec-intl.locale-internal-slots is used. That definition doesn't give Intl.Locale's [[RelevantExtensionKeys]] any extra semantics, so it's not valid to compare it to how [[RelevantExtensionKeys]] works for Intl.DateTimeFormat.

anba commented 1 year ago

I still like to see an explicit definition in the spec if any Unicode extensions are considered when computing weekInfo.

Two options:

  1. Explicitly state that weekInfo is computed from the base-name of the locale. That means Unicode extensions are removed and weekInfo is only computed from the locale's region.
  2. Support Unicode extensions and explicitly state which Unicode extensions are supported and in which order conflicting Unicode extensions are resolved.
    1. ICU currently supports ca and rg when computing the weekInfo. But ca only supports the special case of "iso8601"
    2. ICU will soon also support fw, see https://github.com/unicode-org/icu/pull/2293.
    3. When given the locale en-US-u-ca-iso8601-fw-tue-rg-afzzzz, what is the expected value of weekInfo.firstDay? Possible return values are:
Subtag First day
US 7 (Sunday)
ca-iso8601 1 (Monday)
fw-tue 2 (Tuesday)
rg-afzzzz 6 (Saturday)
FrankYFTang commented 1 year ago

The current spec text said "based on locale." in https://tc39.es/proposal-intl-locale-info/#sec-week-info-of-locale which means all information in the unicode_locale_id could impact the resolution and this issue request us to clearly state "how that is based on" with specific algorithm.

FrankYFTang commented 1 year ago

Asking Mark Davis (Google) and Peter Edberg (Apple) from CLDR / UTS35 co-authors about this now.

FrankYFTang commented 1 year ago

Per anba's comments in https://github.com/tc39/proposal-intl-locale-info/issues/30#issuecomment-1406342637

We need to consider specifiying out the priority of information deciding the getWeekInfo from a. unicode_region_subtag b. type of "ca" key in unicode_locale_extensions (see https://unicode.org/reports/tr35/tr35.html#UnicodeCalendarIdentifier ) c. type of "fw" key in unicode_locale_extensions (see https://unicode.org/reports/tr35/tr35.html#UnicodeFirstDayIdentifier ) d. type of "rg" key in unicode_locale_extensions (see https://unicode.org/reports/tr35/tr35.html#RegionOverride , https://unicode.org/reports/tr35/tr35-info.html#rgScope and also https://github.com/unicode-org/cldr/blob/ee979ab5ba007af22c544db26db699b92bad90af/common/supplemental/rgScope.xml#L10 )

probably also e. type of "sd" key in unicode_locale_extensions see https://unicode.org/reports/tr35/tr35.html#UnicodeSubdivisionIdentifier

FrankYFTang commented 1 year ago

ref upstream issue in https://unicode-org.atlassian.net/browse/CLDR-16866

FrankYFTang commented 1 year ago

CLDR 16866 is is now considering to specify the algorithm to determine firstDay of the weekData from the following information (and the priority)

fw > rg > ca > sd (if no conflict with explicit region subtag) > region_subtag > likely_subtag region

We will not change this proposal until UTS35 complete that specification change.

sffc commented 1 year ago

@FrankYFTang This is on the TG2 agenda but it's not clear there is anything to discuss. We are just waiting on the CLDR change to land, right? If there is anything to discuss, please add it back to the agenda.