unicode-org / icu4x

Solving i18n for client-side and resource-constrained environments.
https://icu4x.unicode.org
Other
1.39k stars 179 forks source link

Add week-of-year to semantic skeletons #5643

Open sffc opened 1 month ago

sffc commented 1 month ago

Week-of-year should be added to semantic skeletons, both in CLDR and in ICU4X.

Previous issue: https://github.com/unicode-org/icu4x/issues/488

Week-of-year requires plurals support. Plurals are mostly supported in the data model already, so this should not require a breaking change.

The following fixture tests were going to be deleted in #5640, and they should be covered by this issue:

    {
        "description": "Date time example that includes a week-of for a locale with plural variants as well as one with non default WeekData",
        "input": {
            "locale": "en",
            "value": "2016-04-17T08:25:07.000+05:00",
            "options": {
                "components": {
                    "year": "numeric",
                    "week": "numeric-week-of-year",
                    "hour": "two-digit",
                    "minute": "two-digit",
                    "second": "two-digit",
                    "time_zone_name": "offset"
                }
            }
        },
        "output": {
            "values": {
                "en": "week 17 of 2016, 08:25:07 GMT+05:00",
                "en-AU": "week 16 of 2016, 08:25:07 GMT+05:00",
                "en-GB": "week 15 of 2016, 08:25:07 GMT+05:00",
                "fil": "linggo 17 ng 2016, 08:25:07 GMT+05:00",
                "en-ZA": "week 17 of 2016, 08:25:07 GMT+05:00"
            }
        }
    }
sffc commented 1 month ago

Previously, week-of-year was handled internally in the datetime formatter. However, in the semantic skeleton world and with flexible input types, I would rather move the week number calculations out into the input type. For example, I would rather have something such as

pub struct WeekDate<C: WeekCalendar> {
    inner: C::WeekDateInner
}

struct ArithmeticWeek {
    pub(crate) year: i32,
    pub(crate) week: u8,
    pub(crate) weekday: Weekday, // 1-7
}

pub struct GregorianWeekDateInner(pub(crate) ArithmeticWeek);

impl WeekCalendar for Gregorian {
    type WeekDateInner = GregorianWeekDateInner;
}

The functions for converting between Date and WeekDate involve data and locale preferences (i.e. WeekCalculator).

This has a number of advantages, including removing the dependence on the week calculator data from datetime (#4340).

(this is not a 2.0-issue, but CC @Manishearth @robertbastian @zbraniecki)

robertbastian commented 3 weeks ago

I would like to at least come up with a design for this before 2.0. Combining the date and the locale from the formatter is going to complicate the API, and semver needs to be thought through now.

sffc commented 3 weeks ago

Yeah fair point.

There are two designs already: the old one (where WeekCalculator is loaded in the formatter) and the one suggested in https://github.com/unicode-org/icu4x/issues/5643#issuecomment-2408973458 (which is a major shift, encoding it in the calendar type).

A middle ground would be to basically treat this similar to weekday. We don't calculate the weekday inside the formatter, even though it should in theory be derivable from the year, month, and day. Likewise, we shouldn't necessarily calculate the week number inside the formatter. The more I think about this, the more I realize that calculating the week number in the formatter is putting logic in the wrong place: it is at its core a calendrical calculation, but the formatter should only be passing through data unchanged, just looking up display names.

I would love it if week numbering were by calendar, but unfortunately it has to include the locale. This means that Date<A> can't calculate the week number.

One option is that we make LocalizedDate<A>, which implements GetField<WeekOfYearInfo>. A downside of this direction is that the locale could be out of sync between the LocalizedDate and the DateTimeFormatter.

sffc commented 3 weeks ago

As far as the semantic skeleton API goes:

My thinking for a while has been that it should just be a new field set. Year + Week, a new Calendar Period Field Set, and Year + Week + Weekday, a new Date Field Set.

Question to people who use week calendars: does anyone ever say "Wednesday, Week 30 of 2024" as a stand-in for "July 24, 2024", and should semantic skeletons support locales that prefer year-week-weekday over year-month-day? Or is it sufficient to make the developer opt-in to week calendars? @eggrobin

robertbastian commented 3 weeks ago

Day-of-week is currently modeled by an IsoWeekday, but that's not sufficient for patterns e..ee and c..cc. These require a locale-sensitive days-since-start-of-week number, so we probably need to introduce a FormattableDayOfWeek, which is returned only by LocalizedDate.

That said, week data is tiny, it's less than 500 bytes (lookup table + data), and it's not data that changes on the regular. It would be a shame to overcomplicate our API for data size or custom-data reasons.

sffc commented 2 weeks ago

LGTM: ~@sffc~ @robertbastian

EDIT: @sffc revokes his LGTM, explained below.

sffc commented 2 weeks ago

2.0 work is to load the data correctly.

sffc commented 2 weeks ago

OK so regarding the week numbering things.

Wikipedia has a good article on this subject: https://en.wikipedia.org/wiki/Week#Other_week_numbering_systems

There are multiple possible schemes. ISO-8601 is the most common, but there are others. The Financial, Media, and Public Health sectors all have standards that they follow, and they might not be the same.

I think it is very safe to say that in cases when the developer wants to display week numbering, they will also want to opt into a specific week numbering scheme, not just allow the locale to pick. This is how I came up with my "week calendar" suggestion.

This still needs more research and I undo my LGTM on the discussion earlier.

sffc commented 2 weeks ago

Yeah, I'm not sure why I was briefly convinced thismorning that week numbering was a display concern. It absolutely is not! Exclamation point there. It is a data model concern. I think DayOfYearInfo fudges around the problem such that it is not as horrible as it could be, but we really should seek out an actually correct solution.

sffc commented 2 weeks ago

Here's how I envision the use case.

You are a public health agency and wish to use a week calendar for your web site.

You create a Date<PublicHealthGregorian>. That object supports a number of GetField impls, most importantly GetField<WeekInfo>. You are then able to format with a field set requiring that field.

pub struct WeekInfo {
    year_of_week: YearInfo,
    week_of_year: i8,
    day_of_week_number: u8,
}

This struct returns data sufficient to format "3rd day of the 5th week of 2025", using the fields c, w, and Y, none of which are currently supported. This could also be extended to support W pending a question I posted to CLDR.

Do we put PublicHealthGregorian in AnyCalendar? Maybe. One catch is that GetField<WeekInfo> would be supported for only a subset of Date<AnyCalendar>, which I don't think we currently support. However, it may be okay for us to say, we simply don't support locale-dependent selection of week numbering. If you want week numbering, be explicit about it.

Manishearth commented 2 weeks ago

That still feels like a locale to me, en-US-publichealth or something. Or a setting thing that you can override with Options.

I see where you're coming from though.

However, it may be okay for us to say, we simply don't support locale-dependent selection of week numbering. If you want week numbering, be explicit about it.

I kind of like this.

robertbastian commented 2 weeks ago

I think it is very safe to say that in cases when the developer wants to display week numbering, they will also want to opt into a specific week numbering scheme, not just allow the locale to pick. This is how I came up with my "week calendar" suggestion.

This does not define a new calendar. It is a display preference for the Gregorian calendar. It can be modeled with preferences or options.

sffc commented 2 weeks ago

Without defining a new calendar, we could just model it as a pub struct Week living alongside pub struct Date. It could be a type defining a specific week in time, not a date.

Not sure what the inner representation would be. It needs the year, but not the month and day. We need to figure this out not only for Week but also for YearMonth, which we should also probably add at some point.

sffc commented 2 weeks ago

Potential data model:

pub struct AbsoluteWeek {
    week: i32, // an absolute week number, defining week 0 to be the first week of 1970 or something
}

pub struct WeekNumberingRule {
    pub weekday_in_first_week: IsoWeekday,
    pub first_day_of_week: IsoWeekday,
    // TODO: Investigate what else we need to fully define the week numbering
}

pub struct Week<A: AsCalendar> {
    week: AbsoluteWeek,
    rule: WeekNumberingRule,
    calendar: A,
}

pub struct WeekInfo {
    pub year_of_week: YearInfo,
    pub week_of_year: i8,
}

impl<A> GetField<WeekInfo> for Week<A> { ... }
Manishearth commented 2 weeks ago

Without defining a new calendar, we could just model it as a pub struct Week living alongside pub struct Date. It could be a type defining a specific week in time, not a date.

That makes some more sense to me. I'd really love to decouple Week stuff from Date.

Manishearth commented 2 weeks ago

More on that:

To me, a calendar defines:

These are three concepts that stack neatly on top of each other, and can stack on top of the general concept of hour/minute/second time.

Weeks, on the other hand, are not calendar-specific. The concept of a 7-day week has a single origin and has proven incredibly sticky historically, with attempts to unseat it (French Revolutionary Calendar, Soviet Calendar) failing. There are some calendars with other non-month ways of grouping days cyclically (Javanese Calendar), but that's a more general concept, not a "week".

Weeks themselves layer on top of calendars, independently of the calendar: a given day is Monday regardless of the calendar it's in. Our Islamic calendar code literally differentiates epoch dates as "Thursday" and "Friday"!

So weeks are more like related_iso for cyclic calendars, an entirely separate timekeeping system whose overlap with the main calendar is useful for disambiguation or redundancy.

Week numbering is a curious concept from the merging of year boundaries (from a specific calendar) and weeks (pan-calendar). It is locale- and calendar- specific, but I don't think it's just a property of the calendar, it's something else. The public health thing to me reads strongly as a preference.

Decoupling week stuff from date stuff would be pretty neat.

sffc commented 2 weeks ago

I agree with @Manishearth's mental model overall.

Regarding week numbering: there are various ways to anchor week numbers. They can be anchored to a year, but this is not the only choice: they are often anchored to a month, and they could in principle be anchored to something else entirely such as religious holidays (ever heard the church saying "Third Sunday of Easter"?) or an astronomical event such as the winter solstice.

So, in that sense, week numbering really is just another calendar system. The periods are the same length, but the way they are anchored and numbered is different.

Manishearth commented 2 weeks ago

ever heard the church saying "Third Sunday of Easter"?

note: that's because the Christian liturgical year is basically its own somewhat strange calendar system.

image But yes, week numbering can be anchored to anything. So can day numbering, but day numbering has some more rigid patterns.