unicode-org / icu4x

Solving i18n for client-side and resource-constrained environments.
https://icu4x.unicode.org
Other
1.38k stars 176 forks source link

Add proleptic approximation to default Chinese/Islamic (dataful) calendars #5778

Open Manishearth opened 1 week ago

Manishearth commented 1 week ago

See https://github.com/tc39/proposal-temporal/issues/2869

As documented in https://github.com/tc39/proposal-temporal/issues/2869, non-Gregorian calendars with sufficiently complex astronomy/math often do not have "one right answer" for questions about e.g. dates a millennium from now in the future. The answer can depend on the precise math used, the precise behavior of the stars, and can be affected by floating point error.

When faced with this dillemma, Temporal would prefer to retain general^1 and calendrical^2 invariants as much as possible, with a strict requirement on general invariants and a best-effort requirement on calendrical ones.

We should udpate our code to support this. The benefit of this is that we can also use this opportunity to remove shipped calendrical calculations from our build: we ship data and dates outside of the data are.

Ideally we still include ChineseAlwaysCalculating and IslamicAlwaysCalculating as available Calendar impls, but that might be a bunch of unnecessary boilerplate so we'll see.

The general implementation plan would involve taking e.g. ChineseBasedCacheV1::get_for_extended_year() and having a version that always returns something by falling back to some proleptic approximation.

The approximations we are thinking of are as follows:

Islamic

All years are alternating 29 and 30 day-months. This means each year is a fixed number of days and calculating the ISO correspondence is easy.

Chinese

This is trickier. We can get Chinese to follow a fixed Metonic cycle of 7 leap years every 12 years (similar to Hebrew), and do something similar with each year having alternating 29 and 30 day months. We'll know the day offset of each new year (and can hardcode it) so knowing the first new year gets us all the other new years.

The tricky thing is for the boundary year: the year after our data ends may need special treatment to ensure that the Metonic cycle's new year falls on the day we want it to[^3]. We can fudge with the 29/30 day lengths for that year, or make it a leap year, as needed. Another option is to instead hardcode the boundary years (currently they can be changed by datagen if desired) and manually calculate a fudged bridge year.

cc @sffc

[^3]: assuming an invariant of the new year being between Jan 20 and Feb 20, the first NY of the metonic cycle is severely constrained to ensure the rest of the cycle still falls in the right range

Manishearth commented 1 week ago

This would fix https://github.com/unicode-org/icu4x/issues/4917 cc @dminor