Open anba opened 3 years ago
we should make it clear in the specification that "ca" Unicode extensions shouldn't be taken into account. I disagree with this. What you state could be true now in the ICU implementation. But we may need to denote two differeent locales , which both are Arabic locale for a particular region, but follow two different calendar, and their weekend may need to be different in the future.
I think we should keep the spec text AS IS so one day we may be able to provide such information from the locale data.
@sffc @ryzokuken @zbraniecki - what do you think?
I think the current spec text does not rule out the impact of any Unicode extension so it is clear any unicode extension may have impact to the weekInfo- I think this is by design. For example, some subdivision may observe different weekend . The fact CLDR and ICU currently does not havce those data should not be the reason we restrict the spec to allow that to happen one day. As you point out some state/province in some country observe different weekend practice and that would be needed to come from rg or sd unicode extension, or come from the ca extension one day.
If any Unicode extension (the relevant ones are: "ca", "rg", "sd", or "fw") should have an effect, we should explicitly mention that instead of leaving this unspecified, because that way we can more easily guarantee compatible behaviour across different implementations.
Is the "u-ca-iso8601" special case something specific to ICU, because I don't see it mentioned anywhere in https://unicode.org/reports/tr35/tr35-dates.html#Week_Data? Are there any other calendars where a user would expect different week-info data when a calendar is explicitly set through "u-ca"? For example if "en-US-u-ca-iso8601"
changes firstDay
from Sunday to Monday, why should "en-US-u-ca-persian"
not also change firstDay
to Saturday, given that Saturday is the first day of the week in the solar Hijri calendar?
Reading the relevant ICU source code to check for additional issues, I've found that ICU respects the rg
Unicode extension to override the region. That can lead to user confusion, because ECMA-402 doesn't yet support rg
.
For example:
> new Intl.DateTimeFormat("en-u-rg-afzzzz").resolvedOptions().calendar
"gregory"
> new Intl.Locale("en-US-u-rg-afzzzz").weekInfo
({firstDay:6, weekendStart:4, weekendEnd:5, minimalDays:1})
Intl.DateTimeFormat
ignores rg
and instead uses the default calendar for en-US
. If the user now wants to find additional information about the calendar for that locale via weekInfo
, the calendar data for the solar Hijri calendar (= default calendar for Afghanistan) are returned.
Week info data isn't really tied to a language, but instead is typically derived from a region. Also see Calendar::setWeekData and ulocimp_getRegionForSupplementalData in ICU. This makes the spec proposal somewhat inconsistent, because TimeZonesOfLocale()
requires that a region subtag is present, whereas WeekInfoOfLocale()
is spec'ed to work even when no region subtag is present. The decision when a region subtag must be present or not is tied to API requirements from ICU, so IOW the ICU API leaks through and defines how things got spec'ed. This doesn't seem right to me.
Should I create a separate issues for these two problems?
Discussion from 2021-09-09 TC39-TG2: https://github.com/tc39/ecma402/blob/master/meetings/notes-2021-09-09.md#define-if-ca-unicode-extensions-have-an-effect-on-intllocaleprototypeweekinfo
We did not reach a hard conclusion, except to establish that we do feel it is important to take subtags such as -u-ca
into account when determining the week info. Frank is to follow up with Anba on this issue to resolve the remaining open questions.
why should
"en-US-u-ca-persian"
not also changefirstDay
to Saturday, given that Saturday is the first day of the week in the solar Hijri calendar?
I do not believe the specification say it should NOT also change it, and that is the point, the specification should NOT say it won't ever change it.
If ICU does not do so, or an implementation of this spec does not do so, or the CLDR include the wrong information or lack of information to implement the correct result, then those issue should be filed to ICU, CLDR and/or browser bug tracking system, instead of restrict this specification to do the right thing.
Week info data isn't really tied to a language, but instead is typically derived from a region.
agree - but why is this related to "u-ca"?
Week info data isn't really tied to a language, but instead is typically derived from a region.
agree - but why is this related to "u-ca"?
It's not directly related to "u-ca", but instead about the ICU implementation, which in turn is related to "u-ca". And that's why I asked if the questions/issues around the region subtag and "u-rg" should be tracked in a different issue. :smile:
Should I create a separate issues for these two problems?
We did not reach a hard conclusion, except to establish that we do feel it is important to take subtags such as -u-ca into account when determining the week info.
As mentioned above, any relevant Unicode extension keys should be explicitly documented.
Additionally, if Unicode extension keys have to be taken into account for any of CalendarsOfLocale()
, CollationsOfLocale()
, HourCyclesOfLocale()
, or NumberingSystemsOfLocale()
, it should also be explicitly mentioned.
I do not believe the specification say it should NOT also change it, and that is the point, the specification should NOT say it won't ever change it.
But neither does the specification say that it should change anything.
If ICU does not do so, or an implementation of this spec does not do so, or the CLDR include the wrong information or lack of information to implement the correct result, [...]
But what exactly is the "correct result"? Can you point to a specification which describes this "correct result"?
IMHO, from the user's point of view, Intl.Locale.prototype.weekInfo has to take the "ca" Unicode extension into account. weekInfo shall be different for "en-US-ca-gregory" (the default for "en-US") and "en-US-ca-iso8601" which should stick to the ISO 8601 standard. A similar behavior should occur for "en-US-u-ca-persian" since the Persian calendar does specify that the first day of the week is Saturday: this should be reflected through weekInfo.
But what exactly is the "correct result"? Can you point to a specification which describes this "correct result"?
https://tc39.es/proposal-intl-locale-info/#sec-week-info-of-locale
Let locale be loc.[[Locale]].
Assert: locale matches the unicode_locale_id production.
Return a record whose fields are defined by Table 1, with values based on locale.
in UTS35 https://unicode.org/reports/tr35/#Unicode_locale_identifier unicode_locale_id | = unicode_language_id extensions* pu_extensions? ;
which include the extensions "Return a record whose fields are defined by Table 1, with values based on locale."
The weekday value indicating which day of the week is considered the 'first' day, for calendar purposes.
As mentioned above, any relevant Unicode extension keys should be explicitly documented.
It is ALREADY documented for Intl.Locale object https://tc39.es/ecma402/#sec-intl.locale-internal-slots "The value of the [[RelevantExtensionKeys]] internal slot is « "ca", "co", "hc", "kf", "kn", "nu" ». If %Collator%.[[RelevantExtensionKeys]] does not contain "kf", then remove "kf" from %Locale%.[[RelevantExtensionKeys]]. If %Collator%.[[RelevantExtensionKeys]] does not contain "kn", then remove "kn" from %Locale%.[[RelevantExtensionKeys]]."
Since these weekInfo is a method on Intl.Locale the RelevantExtensionKeys is already covered in the construction of the Intl.Locale object.
For example, let's look at other pre-existing function in Intl.DateTimeFormat The formatToParts method will behave differently depends on "ca" extension, right? But that is only mentioned in the Intl.DateTimeFormat constructor, but NOT in each method.
Spec text in https://tc39.es/ecma402/#sec-Intl.DateTimeFormat.prototype.formatToParts does not mention it behave different based on "ca" extension, but instead, only on the internal slot of the Intl.DateTimeFormat.
In the same way, Intl.Locale.prototype.weekInfo behave only on the internal slot of Locale, which is loc.[[Locale]]
I think we're still looking for some clarification on this issue, see this comment on the Firefox bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1693576#c27
@FrankYFTang @anba Is this issue still relevant? This proposal is up for discussion in next week's TC39 plenary and it would be great to sort this out before the meeting rather than discuss it in committee again.
Note that the NumberFormat spec text explicitly states when items are "numbering system dependent" (-u-nu
).
The situation here is slightly different since we're operating on a whole Intl.Locale object
Note that the NumberFormat spec text explicitly states when items are "numbering system dependent" (
-u-ns
).
do you mean "-u-nu" (instead of "-u-ns"?)
Note that the NumberFormat spec text explicitly states when items are "numbering system dependent" (
-u-ns
).do you mean "-u-nu" (instead of "-u-ns"?)
Yes
As mentioned above, any relevant Unicode extension keys should be explicitly documented.
It is ALREADY documented for Intl.Locale object https://tc39.es/ecma402/#sec-intl.locale-internal-slots [...]
For example, let's look at other pre-existing function in Intl.DateTimeFormat The formatToParts method will behave differently depends on "ca" extension, right? But that is only mentioned in the Intl.DateTimeFormat constructor, but NOT in each method.
[...]
These are different [[RelevantExtensionKeys]]
slots with different semantics. Intl.DateTimeFormat
is an Intl service constructor, so the [[RelevantExtensionKeys]]
definition from https://tc39.es/ecma402/#sec-internal-slots is used. Intl.Locale
isn't an Intl service constructor, so only the definition from https://tc39.es/ecma402/#sec-intl.locale-internal-slots is used. That definition doesn't give Intl.Locale
's [[RelevantExtensionKeys]]
any extra semantics, so it's not valid to compare it to how [[RelevantExtensionKeys]]
works for Intl.DateTimeFormat
.
I still like to see an explicit definition in the spec if any Unicode extensions are considered when computing weekInfo
.
Two options:
weekInfo
is computed from the base-name of the locale. That means Unicode extensions are removed and weekInfo
is only computed from the locale's region. ca
and rg
when computing the weekInfo
. But ca
only supports the special case of "iso8601"fw
, see https://github.com/unicode-org/icu/pull/2293.en-US-u-ca-iso8601-fw-tue-rg-afzzzz
, what is the expected value of weekInfo.firstDay
? Possible return values are:Subtag | First day |
---|---|
US | 7 (Sunday) |
ca-iso8601 | 1 (Monday) |
fw-tue | 2 (Tuesday) |
rg-afzzzz | 6 (Saturday) |
The current spec text said "based on locale." in https://tc39.es/proposal-intl-locale-info/#sec-week-info-of-locale which means all information in the unicode_locale_id could impact the resolution and this issue request us to clearly state "how that is based on" with specific algorithm.
Asking Mark Davis (Google) and Peter Edberg (Apple) from CLDR / UTS35 co-authors about this now.
Per anba's comments in https://github.com/tc39/proposal-intl-locale-info/issues/30#issuecomment-1406342637
We need to consider specifiying out the priority of information deciding the getWeekInfo from a. unicode_region_subtag b. type of "ca" key in unicode_locale_extensions (see https://unicode.org/reports/tr35/tr35.html#UnicodeCalendarIdentifier ) c. type of "fw" key in unicode_locale_extensions (see https://unicode.org/reports/tr35/tr35.html#UnicodeFirstDayIdentifier ) d. type of "rg" key in unicode_locale_extensions (see https://unicode.org/reports/tr35/tr35.html#RegionOverride , https://unicode.org/reports/tr35/tr35-info.html#rgScope and also https://github.com/unicode-org/cldr/blob/ee979ab5ba007af22c544db26db699b92bad90af/common/supplemental/rgScope.xml#L10 )
probably also e. type of "sd" key in unicode_locale_extensions see https://unicode.org/reports/tr35/tr35.html#UnicodeSubdivisionIdentifier
ref upstream issue in https://unicode-org.atlassian.net/browse/CLDR-16866
CLDR 16866 is is now considering to specify the algorithm to determine firstDay of the weekData from the following information (and the priority)
fw > rg > ca > sd (if no conflict with explicit region subtag) > region_subtag > likely_subtag region
We will not change this proposal until UTS35 complete that specification change.
@FrankYFTang This is on the TG2 agenda but it's not clear there is anything to discuss. We are just waiting on the CLDR change to land, right? If there is anything to discuss, please add it back to the agenda.
When passing a language tag which includes the "ca" Unicode extension to ICU, the calendar is taken into account when retrieving the week-info attributes from ICU's
UCalendar
. Or more concretely, if and only if the calendar is "iso8601", it's taken into account.Because this makes only a difference when the calendar is "iso8601" and always only modifies
firstDay
to Monday andminimalDays
to 4 days, I don't think providing this feature provides any real value and we should make it clear in the specification that "ca" Unicode extensions shouldn't be taken into account.Example when "iso8601" is passed to ICU: