unicode-org / icu4x

Solving i18n for client-side and resource-constrained environments.
https://icu4x.unicode.org
Other
1.39k stars 180 forks source link

[Date]TimeFormatter constructors don't take hour12 override #3881

Open hsivonen opened 1 year ago

hsivonen commented 1 year ago

ECMA-402 has hour12 and hourCycle overrides for the locale's hour cycle. AFAICT, ICU4X [Date]TimeFormatter constructors don't have API surface for this override. For ECMA-402 compat, ICU4X should have API surface for this.

hsivonen commented 1 year ago

Maybe hourCycle is available via -u-hc-? Still leaves hour12 (which happens to be what appears in various preference UIs).

hsivonen commented 1 year ago

hourCycle is in icu_datetime::options::components::Bag.

sffc commented 1 year ago

-u-hc- works.

I don't think we have hour12 except that you can mostly get that via -u-hc- but you need to choose between h11/h12 or h23/h24 instead of letting the locale choose.

hsivonen commented 1 year ago

Letting the locale affect hour12 expansion to hourCycle seems required for ECMA-402 compliance.

AFAICT, by the time the [Date]TimeFormatter constructor has done the provider-level resolution of the locale data, there doesn't appear to be a way for the application code to modify the instance to perform hour12 resolution in application code.

hsivonen commented 1 year ago

macOS Ventura and Gnome expose a boolean system pref for this topic, so hour12 might be relevant to support to be able to honor system preferences. I don't have test code that I'd know would behave according to the system boolean pref semantics. However, from reading the ECMA-402 spec, I'm a bit surprised at how the ECMA-402 semantics are supposed to work. AFAICT, hour12=false for en-US would resolve to h24. Do users actually want that result instead of h23? (As a user of the en-US locale for untranslated strings with 24-hour clock enabled, I don't want h24. I guess I'll need to make a TODO item of observing the system clock at midnight.)

hsivonen commented 1 year ago

Looking at https://github.com/unicode-org/cldr-json/blob/80a94b0f6c3a34d6e2dc0dca8639a54babc87f94/cldr-json/cldr-core/supplemental/timeData.json#L4 , I observe:

  1. The preferred cycle for each locale is either h or H, i.e. h12 or h23.
  2. k and K do not appear in allowed cycles. hb and hB do, but I don't find a spec explaining what they mean.

Given that the preferred cycle for each locale is either h12 or h23, it's unclear to me what problem h11 and h24 or hour12 expanding to h11 or h24 solve.

anba commented 1 year ago

However, from reading the ECMA-402 spec, I'm a bit surprised at how the ECMA-402 semantics are supposed to work.

The current spec is incorrect. There's a PR to fix this → https://github.com/tc39/ecma402/pull/758.

hsivonen commented 1 year ago

However, from reading the ECMA-402 spec, I'm a bit surprised at how the ECMA-402 semantics are supposed to work.

The current spec is incorrect. There's a PR to fix this → tc39/ecma402#758.

Thanks. Given that change and the non-existence of k-default and K-default locales, I guess one option would be to have ICU4X ECMA-402 wrapper code hard-code hour12=true to h12 and hour12=false to h23, and close this API request as WONTFIX.

anba commented 1 year ago

2. k and K do not appear in allowed cycles. hb and hB do, but I don't find a spec explaining what they mean.

Japan has Khttps://github.com/unicode-org/cldr/blob/343bde9e7e8d6cf6f2c57e257fa4f074df970311/common/supplemental/supplementalData.xml#L4888

b and B are day period markers: https://unicode.org/reports/tr35/tr35-dates.html#dfst-period

hsivonen commented 1 year ago
  1. k and K do not appear in allowed cycles. hb and hB do, but I don't find a spec explaining what they mean.

Japan has Khttps://github.com/unicode-org/cldr/blob/343bde9e7e8d6cf6f2c57e257fa4f074df970311/common/supplemental/supplementalData.xml#L4888

Oops. I missed that. So: K is allowed in one locale but isn't the default anywhere and k is specced for completeness and isn't in use anywhere?

b and B are day period markers: https://unicode.org/reports/tr35/tr35-dates.html#dfst-period

Thanks.

anba commented 1 year ago

Oops. I missed that. So: K is allowed in one locale but isn't the default anywhere and k is specced for completeness and isn't in use anywhere?

Yes. K isn't the default hour-cycle for Japan per <timeData>/<hours>, but when selecting {hour: "numeric", hour12: true}, the resolved pattern will contain K, see here. That also means it's not possible to replace hour12=true with hourCycle=h12.


For example new Intl.DateTimeFormat("en", {hour:"numeric"}) can be customised as follows:

Options Skeleton Resolved Pattern Final Pattern
{hour:"numeric"} j h a h a
{hour:"numeric", hour12: true} h h a h a
{hour:"numeric", hour12: false} H HH HH
{hour:"numeric", hourCycle: "h11"} h h a K a
{hour:"numeric", hourCycle: "h12"} h h a h a
{hour:"numeric", hourCycle: "h23"} H HH HH
{hour:"numeric", hourCycle: "h24"} H HH kk

And new Intl.DateTimeFormat("ja", {hour:"numeric"}) can be customised as follows:

Options Skeleton Resolved Pattern Final Pattern
{hour:"numeric"} j H時 H時
{hour:"numeric", hour12: true} h aK時 aK時
{hour:"numeric", hour12: false} H H時 H時
{hour:"numeric", hourCycle: "h11"} h aK時 aK時
{hour:"numeric", hourCycle: "h12"} h aK時 ah時
{hour:"numeric", hourCycle: "h23"} H H時 H時
{hour:"numeric", hourCycle: "h24"} H H時 k時

In an input skeleton, h is automatically matched to either h or K in the resolved pattern. Similarly, H is matched to either H or k.

Spec:


The allowed strings in <timeData>/<hours> are mostly relevant for the C skeleton, so it's not yet relevant ECMA-402 date-time formatting. (Spec: https://unicode.org/reports/tr35/tr35-dates.html#availableFormats_appendItems)

They're possibly relevant for the stage-3 "Intl Locale Info" proposal. There's a HourCyclesOfLocale operation, which is spec'ed to return the hour-cycle formats which are in "common use for date and time formatting". So this operation could return the allowed values from <timeData>/<hours>.

ICU4C doesn't have a public API to retrieve the allowed values, though. Instead it's necessary to manually read the resource data, cf. DateTimeFormat::GetAllowedHourCycles.

hsivonen commented 1 year ago

Yes. K isn't the default hour-cycle for Japan per <timeData>/<hours>, but when selecting {hour: "numeric", hour12: true}, the resolved pattern will contain K, see here. That also means it's not possible to replace hour12=true with hourCycle=h12.

Thanks. So ICU4X is currently missing a way to handle hour12 in a data-driven way.

Just so that I understand the feasibility of hard-coded special cases if this issue isn't addressed in ICU4X itself: It would be possible for ECMA-402 implementation glue code to get correct results (with the scope of what's known about what is in CLDR) by expanding the boolean hour12 and the boolean "region is JP" to hourCycle, right?

That is:

if hour12 {
  if region_of_locale_is_JP {
    h11
  } else {
    h12
  }
} else {
  h23
}

They're possibly relevant for the stage-3 "Intl Locale Info" proposal. There's a HourCyclesOfLocale operation, which is spec'ed to return the hour-cycle formats which are in "common use for date and time formatting". So this operation could return the allowed values from <timeData>/<hours>.

The rendered spec that you linked to has HourCyclesOfLocale, but the README claims "Hour Cycle DROPPED by Champion". @FrankYFTang , is the current intention to include or exclude HourCyclesOfLocale?

ICU4C doesn't have a public API to retrieve the allowed values, though. Instead it's necessary to manually read the resource data, cf. DateTimeFormat::GetAllowedHourCycles.

I don't see any non-test callers for that method. What am I missing?

anba commented 1 year ago

Just so that I understand the feasibility of hard-coded special cases if this issue isn't addressed in ICU4X itself: It would be possible for ECMA-402 implementation glue code to get correct results (with the scope of what's known about what is in CLDR) by expanding the boolean hour12 and the boolean "region is JP" to hourCycle, right?

It needs to be hard-coded on the language, not the region, because the date-time patterns are in https://github.com/unicode-org/cldr/blob/main/common/main/ja.xml.

I don't see any non-test callers for that method. What am I missing?

Only the parts relevant for the "Unified Intl API" work (bug 1686965) have been committed in bug 1693576. The rest will be put up for review when the open issues in the proposal have been resolved.

sffc commented 1 year ago

There is some interesting code to handle some of this resolution logic in components/datetime/src/pattern/hour_cycle.rs

I also observe that we already have the preferred hour cycle (h11h12 or h23h24) in ICU4X data: https://github.com/unicode-org/icu4x/blob/main/provider/datagen/tests/data/json/datetime/timelengths%401/en.json\

So I think everything is here to support hour12 if we were to add it to an options bag somewhere.

jufemaiz commented 1 year ago

Just to make this clearer for those playing along here. Japan is the only country that allows support for the use of K value for times.

<hours preferred="H" allowed="H K h" regions="JP"/>

Ref: https://github.com/unicode-org/cldr/blob/343bde9e7e8d6cf6f2c57e257fa4f074df970311/common/supplemental/supplementalData.xml#L4888

The options are h, H, K, k and are defined as such:

image

https://unicode.org/reports/tr35/tr35-dates.html#dfst-hour

Currently ECMA spec incorrectly assumes a coupling of h-k and H-K. That is the following is baked in as implicit assumption:

The ECMA standard definitely needs to change as the current implementation is a bug. The universal (as far as I've been able to determine) rejection of k, and the only occasional adoption of K as an option renders the above assumption absolutely incorrect, and realistically should have been identified prior to publication. https://github.com/tc39/ecma402/pull/758 has identified a solution that expands how 12-hour and 24-hour time is presented at a regional level. Work is ongoing to get this to a point of acceptance. This is slated for 2023-09 TC39 meeting.