unicode-org / icu4x

Solving i18n for client-side and resource-constrained environments.
https://icu4x.unicode.org
Other
1.38k stars 176 forks source link

Thoughts on valid inputs for date pattern selection #605

Closed sffc closed 3 years ago

sffc commented 3 years ago

CC @gregtatum

This is not an actionable issue; it's a way for me to write down some thoughts to answer the question of how we determine a valid input to date time selection.

NB: I am not dealing with Weeks in this post.

Lengths

There are four lengths I would like to propose:

  1. Long = full name, no abbreviations.
  2. Medium = full name, abbreviations allowed.
  3. Short = short name, possibly numeric.
  4. Short-Lossy = as short as possible, even if ambiguities are possible. Known as "Narrow" in CLDR in components where it is supported.

Here are examples in en-US in each of the four components:

Field Filters

The valid field filters for each component might be:

  1. Date
    1. Era, Year, Month, Day
    2. Year, Month, Day (era may be optionally added by ICU depending on calendar system)
    3. Era, Year, Month
    4. Year, Month (optional era)
    5. Era, Year
    6. Year (optional era)
    7. Era (standalone)
    8. Month, Day
    9. Month (standalone)
    10. Day (standalone -- unclear if this is needed)
  2. Time
    1. Hour, Minute, Second, Fractional Second (day period may be optionally added by ICU depending on user preference for hour cycle)
    2. Hour, Minute, Second (optional day period)
    3. Hour, Minute (optional day period)
    4. Hour (optional day period)
      • Is there a valid use case for displaying a time of day without hours? Maybe these are all the valid inputs.
  3. Weekday
    1. Weekday displayed
  4. Time Zone
    1. Time zone displayed

Cartesian Product and Compression

Naively, the cartesian product is:

((4 date lengths) (10 date field filters) + (1 if date disabled)) ((1 time length) (4 time field filters) + (1 if time disabled)) ((4 weekday widths) (1 weekday field filters) + (1 if weekday disabled)) ((2 time zone widths) * (1 time zone field filters) + (1 if time zone disabled)) = 3075 valid inputs.

Now, we obviously don't want to ship 3075 patterns per locale. We could employ techniques along the lines of what we described yesterday as CLDR's "compression algorith", like glue patterns (#585), or we could come up with our own compression technique in ICU4X.

gregtatum commented 3 years ago

Would the Rust API for the components be an enumeration of the combinations for each component? This seems like it would simplify the process of building a valid components bag.

e.g. Something like...

struct ComponentsBag {
  date: Option<DateComponents>,
  time: Option<TimeComponents>,
  weekday: Option<Length>,
  time_zone: Option<Length>
}

enum DateLength {...}

enum DateComponents {
  EraYearMonthDay(DateLength),
  YearMonthDay(DateLength),
  EraYearMonth(DateLength),
  YearMonth(DateLength),
  EraYear(DateLength),
  Year(DateLength),
  Era(DateLength),
  MonthDay(DateLength),
  Month(DateLength),
  Day(DateLength),
}

Date fields

I think time is missing "Minute, Second", which is in the CLDR data. I would also think that "Second, Fractional Second", and "Minute, Second, Fractional Second" would be valid as well.

sffc commented 3 years ago

Would the Rust API for the components be an enumeration of the combinations for each component?

That's one way to do it, yes. I like that. Or I guess conceptually, I was thinking of DateComponents and DateLength as two separate options in the bag, rather than nesting them.

I think time is missing "Minute, Second", which is in the CLDR data. I would also think that "Second, Fractional Second", and "Minute, Second, Fractional Second" would be valid as well.

We need those for durations, but do we need them for clock times?

gregtatum commented 3 years ago

Initial stab at the JSON backer for this proposal.

{
  "preferred_hour_cycle": "H11H12",
  "glue": {
    "weekday-time_zone": "{0} {1}", // It's technically possible to generate this Bag :-/
    "date-time-long": "{1} 'at' {0}",
    "date-time-medium": "{1} 'at' {0}",
    "date-time-short": "{1}, {0}",
    "date-time-shortLossy": "{1}, {0}",
  },
  "time": {
    "glue": null,
    "h11_h12": {
      "glue": null,
      "components": {
        "hourMinuteSecondFractionalSecond": null, // Fractional seconds isn't in CLDR?
        "hourMinuteSecond": "h:mm:ss a",          // "2:05:00 PM"
        "hourMinute": "h:mm a",                   // "2:05 PM"
        "hour": "h a",                            // "2:05 PM"

        // Optional specializations. These only get matched against if there is no Date
        // component.
        "weekdayHourMinuteSecondFractionalSecond": null,
        "weekdayHourMinuteSecond": null,
        "weekdayHourMinute": null,
        "weekdayHour": null
      }
    },
    "h23_h24": {
      "glue": null,
      "components": {
        "hourMinuteSecondFractionalSecond": null, // Fractional seconds isn't in CLDR?
        "hourMinuteSecond": "HH:mm:ss",           // "Tue 14:05:00"
        "hourMinute": "",
        "hour": "h a",

        // Optional specializations. These only get matched against if there is no Date
        // component.
        "weekdayHourMinuteSecondFractionalSecond": null,
        "weekdayHourMinuteSecond": null,
        "weekdayHourMinute": null,
        "weekdayHour": null
      }
    }
  },
  "long": {
    "date": {
      "glue": {
        // Use the abbreviated version for glued patterns.
        "era": "{1} G"
      },
      "components": {
        // Required fields:
        "yearMonthDay": "MMMM d, y", // "January 20, 2020" (note that CLDR only contains "yMMMd",
                                     // but that the field expansion yields this)
        "yearMonth": "MMMM y",       // "January 2020"
        "year": "Y",                 // 2020
        "era": "GGGG",               // "Anno Domini", this will be used as appends.
        "monthDay": "MMMM d",        // "January 20"
        "month": "MMMM",             // "January",
        "day": "d",                  // "20"

        // Era is relying on appends, but could be customized.
        "eraYearMonthDay": null,
        "eraYearMonth": null,
        "eraYear": null,

        // These are additional date customizations available.
        "weekdayEraYearMonthDay": null,
        "weekdayYearMonthDay": null,
        "weekdayEraYearMonth": null,
        "weekdayYearMonth": null,
        "weekdayEraYear": null,
        "weekdayYear": null,
        "weekdayEra": null,
        "weekdayMonthDay": null,
        "weekdayMonth": null,
        "weekdayDay": null
      }
    },
    // Stand-alone weekday.
    "weekday": "EEEE" // "Tuesday"
    // What customization should we tie in here?
    "time_zone" {
      // TODO
    }
  },

  // These are just copies of the same patterns above, but show a complete example
  "medium": {
    "date": {
      "glue": {},
      "components": {
        "yearMonthDay": "MMMM d, y",
        "yearMonth": "MMMM y",
        "year": "Y",
        "era": "GGGG",
        "monthDay": "MMMM d",
        "month": "MMMM",
        "day": "d"
      }
    },
    "weekday": "EEEE",
    "time_zone": {}
  },
  "short": {
    "date": {
      "glue": {},
      "components": {
        "yearMonthDay": "MMMM d, y",
        "yearMonth": "MMMM y",
        "year": "Y",
        "era": "GGGG",
        "monthDay": "MMMM d",
        "month": "MMMM",
        "day": "d"
      }
    },
    "weekday": "EEEE",
    "time_zone": {}
  },
  "shortLossy": {
    "date": {
      "glue": {},
      "components": {
        "yearMonthDay": "MMMM d, y",
        "yearMonth": "MMMM y",
        "year": "Y",
        "era": "GGGG",
        "monthDay": "MMMM d",
        "month": "MMMM",
        "day": "d"
      }
    },
    "weekday": "EEEE",
    "time_zone": {}
  }
}
gregtatum commented 3 years ago

The design doc for this discussion is here: https://docs.google.com/document/d/18v9fQcDvHDkG_7Hx6rDt1r3Mq6_JMecgORgOH4yXAWU/edit

I will follow-up with filing new actionable issues.