unicode-org / icu4x

Solving i18n for client-side and resource-constrained environments.
https://icu4x.unicode.org
Other
1.38k stars 178 forks source link

Now might be a good time to figure out off-the-shelf calendar sets in DateTimeFormatter #4889

Open sffc opened 6 months ago

sffc commented 6 months ago

We currently have only two modes: one calendar or all calendars. The problem with all-calendars is that it includes JapaneseExtended, which we made separate specifically because we want it to be excluded by default, and it would also include any potential non-CLDR calendars that we get from third party contributors.

While I'm in the middle of the NeoDateFormatter restructuring, it would not be hard for me to add multiple constructors:

The good news is that with fewer formatter types, we don't need to generate lots of contructors on lots of types. We would need to add these on only one type, NeoFormatter.

Thoughts?

@Manishearth @zbraniecki

Manishearth commented 6 months ago

I think this is a good idea and is a great example of why the neo model works much better overall.

We may need to eventually figure out what counts as extended but for now "Japanext + community calendars" seems fine.

sffc commented 6 months ago

Conclusion: In 2.0, remove JapaneseExtended from AnyCalendar and DateTimeFormatter bounds

LGTM: @sffc @robertbastian

@Manishearth thoughts?

Manishearth commented 6 months ago

I'm fine with this. Should we also consider removing any of the Islamic ones? I'm not sure which ones are "major" or in active civil use.

sffc commented 6 months ago

The CLDR data has each calendar except Julian (and Japanese Extended) listed in at least one locale:

    <calendarPreferenceData>
        <calendarPreference territories="001" ordering="gregorian"/>
        <calendarPreference territories="BD DJ DZ EH ER ID IQ JO KM LB LY MA MR MY NE OM PK PS SD SY TD TN YE" ordering="gregorian islamic islamic-civil islamic-tbla"/>
        <calendarPreference territories="AL AZ MV TJ TM TR UZ XK" ordering="gregorian islamic-civil islamic-tbla"/>
        <calendarPreference territories="AE BH KW QA" ordering="gregorian islamic-umalqura islamic islamic-civil islamic-tbla"/>
        <calendarPreference territories="AF IR" ordering="persian gregorian islamic islamic-civil islamic-tbla"/>
        <calendarPreference territories="CN CX HK MO SG" ordering="gregorian chinese"/>
        <calendarPreference territories="EG" ordering="gregorian coptic islamic islamic-civil islamic-tbla"/>
        <calendarPreference territories="ET" ordering="gregorian ethiopic"/>
        <calendarPreference territories="IL" ordering="gregorian hebrew islamic islamic-civil islamic-tbla"/>
        <calendarPreference territories="IN" ordering="gregorian indian"/>
        <calendarPreference territories="JP" ordering="gregorian japanese"/>
        <calendarPreference territories="KR" ordering="gregorian dangi"/>
        <calendarPreference territories="SA" ordering="islamic-umalqura gregorian islamic islamic-rgsa"/>
        <calendarPreference territories="TH" ordering="buddhist gregorian"/>
        <calendarPreference territories="TW" ordering="gregorian roc chinese"/>
    </calendarPreferenceData>

That said, I don't have a very great sense of which Hijri calendars are needed in which contexts. For example, would it be acceptable if we included islamic-umalqura in the default set but not islamic (observational)? I was having a lot of trouble getting islamic to behave correctly in the stress testing in https://github.com/unicode-org/icu4x/pull/4904.

sffc commented 6 months ago

CC @roozbehp @younies for potential feedback on calendar selection.

Manishearth commented 6 months ago

Yeah that's what I'm trying to figure out; it's unclear which ones are actually in use.

roozbehp commented 6 months ago

I would keep all "Islamic"s. I know for a fact that Saudi Arabia uses islamic-umalqura, that Iran uses islamic observational, and that some Arab countries use the tabular and civil calendars. All four are needed.

sffc commented 1 month ago

Just to note, I would like to remove Japanese Extended from AnyCalendar and also remove it from the default baked data, but of course still allow it to be easily generated via datagen.

Manishearth commented 1 month ago

Oh I wanted to say I'm actually not fully in favor of removing it from AnyCalendar, I'm in favor of removing it from the base AnyCalendar constructor however.

I think AnyCalendar should support all calendars that we have, ideally. If that gets complicated I guess we can remove Japanext for now.

sffc commented 1 month ago

How would that work? The goal is to remove its data from the default constructors. If it is still in AnyCalendar, then what happens if someone tries to load that calendar but the data isn't available?

Manishearth commented 1 month ago

@sffc it's not loaded by the default ctor and constructing with AnyCalendarKind::Japanext is an error or a fallback.

It's there in the enum and you use a special ctor to include it, or just construct manually.

The main question is what this does to the datetime format APIs but I think they're already fallible here.

sffc commented 1 month ago

In this case I don't think a fallback is the right behavior, because if someone asks for japanext they actually want japanext. But, to_calendar is an infallible function. But maybe based on your comments elsewhere you think it should be fallible.

Manishearth commented 1 month ago

Why would this make to_calendar need to be fallible?

The idea is that AnyCalendar retains the variant, but it needs to be explicitly constructed, AnyCalendar::new(variant) doesn't load it.

sffc commented 1 month ago

From discussion with @Manishearth, tentative direction, if it works (I'm not sure if it does):