unicode-org / icu4x

Solving i18n for client-side and resource-constrained environments.
https://icu4x.unicode.org
Other
1.37k stars 176 forks source link

Ensure components::Bag will always generate a result #586

Open gregtatum opened 3 years ago

gregtatum commented 3 years ago

I'm modifying this big to be a bit more subtle, which is to ensure that every components::Bag returns a result. The appendItems support is one way of doing that, but currently the components bag may not return a result. This wouldn't be compatible with ECMA-402, so we should have a solution for it. This may be appendItems, or something else.


Original discussion:

https://unicode.org/reports/tr35/tr35-dates.html#Matching_Skeletons

In case the best match does not include all the requested calendar fields, the appendItems element describes how to append needed fields to one of the existing formats. Each appendItem element covers a single calendar field. In the pattern, {0} represents the format string, {1} the data content of the field, and {2} the display name of the field (see Calendar Fields).

For the skeleton matching code, there is an appendItems list. It is used to add items to the matching patterns. This is the only way to add time zones to the patterns. However, there was some debate on the quality of results from using the appendItems list. In the 2020-02-05 ICU4X meeting, it was determined to not use this appendItems list. However, we clearly need it for time zones.

I'm labeling this issue as discussion, as we have some outstanding issues to figure out:

sffc commented 3 years ago

I like using append items for time zones, since they are their own orthogonal component.

As far as eras, and other things like weekdays and fractional seconds... I would rather see a world where all of those patterns are already present in data, so we don't need to use append items at runtime. Either we get those patterns from translators, or we apply the append items at build time only (in the CLDR transformer) in order to make sure we cover all the key sets of fields.

gregtatum commented 3 years ago

I'm trying wrap my brain around the math of potential combinations. I'm documenting it here.

Combinations for "date, time, era, timezone, weekday"

date, era date, era, time date, era, time, timezone date, era, time, timezone, weekday date, era, time, weekday date, era, timezone date, era, timezone, weekday date, era, weekday date, time date, time, timezone date, time, timezone, weekday date, time, weekday date, timezone date, timezone, weekday date, weekday era, time era, time, timezone era, time, timezone, weekday era, time, weekday era, timezone era, timezone, weekday era, weekday time, timezone time, timezone, weekday time, weekday timezone, weekday

gregtatum commented 3 years ago

Next, let's assume weekday will be included in either date or time during the initial field symbol matching.

date, era date, era, time date, era, time, timezone date, era, timezone date, time date, time, timezone date, timezone era, time era, time, timezone era, timezone time, timezone

sffc commented 3 years ago

Era should go inside date. It's strongly coupled with the year, like day period is strongly coupled with the hour.

I think we have either three of four orthogonal components:

  1. Date
  2. Time
  3. Time Zone
  4. Weekday (could be part of Date)

Then, within each component, we just need to exhaustively enumerate all valid sets of fields.

gregtatum commented 3 years ago

2020-04-02: Deep dive conclusion: Overall, appendItems for every field is something we do not want to support. Instead, the API should be a clear cartesian product of available inputs. We will still support appendItems as necessary "glue patterns" for certain fields, although it may eventually be rolled up into the data provider to generate the full cartesian product.

From my own perspective, this means that we can use them as needed for things like time zone, but to keep an eye out for ways to factor them out.

sffc commented 3 years ago

@gregtatum: Please file smaller follow-up issues to track the appendItems that we actually want to add. I am putting this issue on the v1 backlog to revisit once we have figured out the larger story for datetime skeletons.

gregtatum commented 3 years ago

Here are the current CLDR append items.

{
 "appendItems": {
   "Day": "{0} ({2}: {1})",
   "Day-Of-Week": "{0} {1}",
   "Era": "{0} {1}",
   "Hour": "{0} ({2}: {1})",
   "Minute": "{0} ({2}: {1})",
   "Month": "{0} ({2}: {1})",
   "Quarter": "{0} ({2}: {1})",
   "Second": "{0} ({2}: {1})",
   "Timezone": "{0} {1}",
   "Week": "{0} ({2}: {1})",
   "Year": "{0} {1}"
 }
}

I'm not going to file the bugs quite yet, as I'd like to go through the next deep dive. But I believe the valid list is:

gregtatum commented 2 years ago

I didn't file follow-up issues as the new requirements for this bug are to ensure that the components::Bag always returns a result, rather than specifically implementing appendItems (which may be the actual work, depending on consensus and design.) The Ideal Components Bag #1317 may work around this issue by having higher quality results.

dminor commented 2 years ago

We've agreed to postpone implementing this until post ICU4X 1.0

sffc commented 5 months ago

With neo skeleta (#1317), the place where it seems we may still need AppendItems is when constructing patterns that are omitted from certain calendars, such as a week calendar in Chinese, for which there simply isn't data. We could also consider just falling back to Generic/Gregorian or erroring for these cases.