unicode-org / icu4x

Solving i18n for client-side and resource-constrained environments.
https://icu4x.unicode.org
Other
1.33k stars 173 forks source link

Investigate and potentially add back custom time zone fallbacks #5244

Open sffc opened 1 month ago

sffc commented 1 month ago

In the 2.0 release, icu::datetime will support time zone formatting within the following constraints:

  1. Must conform to fields listed in UTS 35
  2. Fallback within one of those fields can be skipped using a custom TimeZoneNames

For example, the following are supported:

The last two are currently possible only when using a custom TimeZoneNames in which the loading of location data has been skipped. It could be supported directly in NeoFormatter by creating a new marker type that has different data loading behavior.

It is not currently possible to construct fallback chains such as the following, as was possible in ICU4X 1.0:

Mechanically, here are a few ways to go about this:

(1) Specify a universal order for resolving the display names and skip them when data wasn't loaded. For example, the universal order could be:

  1. SpecificShort
  2. SpecificLong
  3. GenericShort
  4. GenericLong
  5. Location
  6. Gmt

This ordering supports all current display fallbacks and those suggested above as potential common use cases. An advantage of this approach is that we could consolidate all time zone formatting code (not data loading code) to a single code path. The differentiator would simply be which time zone marker was used at construction time.

(2) Leverage private use field widths. Different private use field widths could correspond to different fallback sets.

sffc commented 1 month ago

@Manishearth Does the proposed option 1 sound reasonable to you? I know I've gone back and forth through multiple different versions of this scheme with you (https://github.com/unicode-org/icu4x/pull/5024) but the cleanest solution was never really mentioned: during formatting, check whether a payload is present; if it is, try using it, and if not, go to the next one. There are only 6 such payloads, so I think this should be just as efficient or more efficient than the current scheme that involves looking up the time zone symbol in a table and then figuring out which styles to use.

Manishearth commented 1 month ago

I think that seems fine. I don't think I have a strong opinion on this after having seen the back and forth.

I'd rather not stuff more into private use widths unless we really need to.