tc39 / proposal-intl-era-monthcode

To specify necessary details about era, eraYear and monthCode usage with Temporal in internationalization setting (for calendars other than "iso8601").
https://tc39.github.io/proposal-intl-era-monthcode
MIT License
3 stars 3 forks source link

Shane's December 2022 Proposal #9

Open sffc opened 1 year ago

sffc commented 1 year ago

My proposal is scattered between different threads, so I thought I'd try to summarize it here.

There are 2 types of calendars with eras: those with a variable number of eras, and those with a fixed number. Japanese is the one with a variable number of eras, and Manish has already written a nice spec for that. Therefore, in the rest of this post, I am focused on calendars with a fixed number of eras.

Eras should define two things:

  1. An epoch day
  2. A formula for converting from the (eraYear, monthCode, day) triple to a number of days relative to that epoch

eraYear is permitted to be negative.

Eras should be named as follows:

  1. For eras counting forwards: the BCP-47 ID of the canonical calendar for that era
  2. For eras counting backwards: pre- followed by the BCP-47 ID

With these rules, we fully define the era codes for all CLDR calendars.

Calendar ID Eras
buddhist buddhist
chinese *1
coptic pre-coptic, coptic
dangi *1
ethioaa ethioaa
ethiopic ethioaa, ethiopic *2
gregory pre-gregory, gregory
julian-gregory pre-julian, julian, gregory *3
hebrew hebrew
indian indian
islamic islamic
islamic-civil islamic-civil
islamic-rgsa islamic-rgsa
islamic-tbla islamic-tbla
islamic-umalqura islamic-umalqura
iso8601 iso8601 *4
japanese *5
persian persian
roc roc

*1 In these cyclic calendars, there is no clear epoch. We need to either choose an epoch or define the era/eraYear in terms of ISO-8601 / Gregorian.

*2 Dates in the incarnation era are expressed with ethiopic; dates prior to that are expressed in ethioaa. Dates prior to the creation of the earth are expressed as negative numbers in ethioaa.

*3 Note that input dates can be labeled as either gregory or julian and they will be interpreted correctly, even if the change date is unknown by the caller.

*4 This is the same era as gregory; unclear if it needs to be re-defined.

*5 See the doc linked at the top of this post for how to handle Japanese.

macchiati commented 1 year ago

Eras should define two things:

  1. An epoch day
  2. A formula for converting from the (eraYear, monthCode, day) triple to a number of days relative to that epoch

I'd rather have a definition that is less tied to the particular variant, perhaps on the lines of the following. It is motivated by the sharing of era starts among islamic calendars. That is, as I recall, islamic, islamic-civil, islamic-rgsa, islamic-tbla, islamic-umalqura all have the same eras, so the identifier should be the same.

sffc commented 1 year ago

Discussion with @louis-aime @manishearth

FrankYFTang commented 1 year ago

for 'roc' calendar, you need to have two different era code, one for "roc" era and one for pre roc era. the mapping is

Gregorian year era in "roc" calendar eraYear in "roc" calendar
1909 "pre-roc" 3
1910 "pre-roc" 2
1911 "pre-roc" 1
1912 "roc" 1
1913 "roc" 2
...
2022 "roc" 111
2023 "roc" 112
sffc commented 1 year ago

There is existing data in:

https://github.com/unicode-org/cldr/blob/0b4bc2fad258ba35c84d5acb39ee58248caabb87/common/supplemental/supplementalData.xml

We could encode this in a way such as

        <calendar type="ethiopic">
          <calendarSystem type="other"/>
          <eras>
              <era type="0" end="8-08-28" name="ethioaa" aliases="mundi"/>
              <era type="1" start="8-08-29" name="ethiopic" aliases="incar incarnation"/>
          </eras>
        </calendar>
        <calendar type="ethiopic-amete-alem">
          <eras>
              <era type="0" end="-5492-08-29" name="ethioaa" aliases="mundi"/>
          </eras>
          <!-- Not sure if we want this: -->
          <eraInputs>
            <eraInput name="ethiopic"/>
          </eraInputs>
        </calendar>
sffc commented 1 year ago

For Japanese:

        <calendar type="japanese">
          <calendarSystem type="solar" />
          <eras>
              <era type="237" end="0-12-31" />
              <era type="238" start="1-01-01" />
              <era type="0" start="645-6-19"/>
              <era type="1" start="650-2-15"/>
              <era type="2" start="672-1-1"/>
              <!-- ... -->
              <era type="235" start="1989-1-8"/>
              <era type="236" start="2019-5-1"/>
          </eras>
        </calendar>

Need to check if this is well-formed. Constraints we're aware of:

  1. The numbers shouldn't change because they are returned in ICU API
  2. The numbers should be sequential because ICU stores the era names in an array
Manishearth commented 1 year ago

Also we tentatively settled on all calendars having the same set of eras for input and output, except that input admits aliases (so, input accepts more codes, but the same underlying eras).

For example, the "ethiopic" era is not permitted as an input to the "ethioaa" calendar (even though the "ethioaa" era will be one of the allowed eras as input to "ethiopic").

Furthermore, input will accept out-of-range values and normalize them (e.g. gregory 2020 in the Japanese calendar is accepted and normalized to reiwa 2)

The cases where a calendar will use eras named by another calendar are:

sffc commented 1 year ago

There was some disagreement regarding BCP-47 in today's CLDR call. The committee generally feels that "if we can make the canonical IDs BCP-47, we should." @FrankYFTang pointed out that if the Japanese emperor creates an era with a single Kanji, it might be only 2 letters long and not BCP-47. In my opinion, in the fairly unlikely case this happens, the problem is easy enough to solve by fiddling with the identifier, like padding it with 0s, and the unpadded version can be added as an alias, which doesn't need to be BCP-47. Note that the japanese era code doc proposed era codes like "showa-1312", which is BCP-47-friendly. An interesting question is whether to keep the year always at 4 digits or let it slide down to 3 digits (which is technically BCP-47 but maybe undesirable from that standpoint).

ljharb commented 1 year ago

(not such that it blocks any changes here) is it entirely outside the realm of possibility for Ecma to send a formal request to the Japanese emperor to ensure that future eras conform to BCP-47?

justingrant commented 1 year ago

I just discovered this repo. Glad we're specifying this stuff! A few notes:

Eras should define two things:

  1. An epoch day
  2. A formula for converting from the (eraYear, monthCode, day) triple to a number of days relative to that epoch

In the current Temporal polyfill, there is additional metadata for each era which may (or may not) be relevant to the work here in this repo, so sharing here in case it's useful:

eraYear is permitted to be negative.

Do you mean "permitted for input"? "Exposed as eraYear property? Something else?

Also, what does it mean to provide a negative eraYear for eras that count time backwards? Is {era: 'bce', eraYear: -10} a valid input? Regardless of the answer, this should be specified.

Eras should be named as follows:

  1. For eras counting forwards: the BCP-47 ID of the canonical calendar for that era
  2. For eras counting backwards: pre- followed by the BCP-47 ID

I assume that bce is an exception to this rule?

*2 Dates in the incarnation era are expressed with ethiopic; dates prior to that are expressed in ethioaa. Dates prior to the creation of the earth are expressed as negative numbers in ethioaa.

Probably needs a bit more explanation for this calendar.

Related: https://github.com/tc39/ecma402/issues/534 (which should be migrated to this repo?)

*3 Note that input dates can be labeled as either gregory or julian and they will be interpreted correctly, even if the change date is unknown by the caller.

What will be the anchor era for this calendar? In other words, for {year: 1, month: 1, day: 1}, which era will be { eraYear: 1, month: 1, day: 1 } on the same day?

Also, when formatting a localized date in this calendar, how will it know which era to use in the output? Will you be adding an option to the DateTimeFormat constructor to specify the switchover date? Will the switchover date be inferred by locale?

And is there a plan to also have a plain julian calendar too? Or just julian-gregory?

If there's a separate issue or proposal about Julian calendars, what is it?

iso8601 iso8601 *4

Interesting. So you're proposing to support eras for all calendars, not just those that actually use eras? (Meaning they have more than one.) I can see pros and cons of this approach, but I'm curious to hear from you about what you think.

Also, it sounds like you're proposing that calendars that don't use eras will have a single era whose name matches the calendar's name. Is that correct? If so, they you may want to simplify your table to just list the calendars that use eras, and note that the other calendars just have a single era with the calendar name.

BTW, if every calendar uses eras then presumably era/eraYear processing will be identical in 262 vs. 402 which might simplify the spec a bit.

Calendar ID Eras

Where are era aliases specified? Should these be included in this table?

There are 2 types of calendars with eras: those with a variable number of eras, and those with a fixed number. Japanese is the one with a variable number of eras, and Manish has already written a nice spec for that. Therefore, in the rest of this post, I am focused on calendars with a fixed number of eras.

This is a good way to differentiate. Another important difference: Japanese is the only CLDR calendar with more than 2 eras. So "fixed" here doesn't only mean "doesn't change", it also means 1 or 2.

justingrant commented 1 year ago

A few more things:

Will there be an enumeration API to find the eras for each calendar? If yes:

*3 Note that input dates can be labeled as either gregory or julian and they will be interpreted correctly, even if the change date is unknown by the caller.

This calendar is interesting because eras are overlapping. In all other calendars, the first day of an era one day later than the last day of previous eras. But not so for julian vs. gregory eras. Ditto for the epoch date of eras: in all other calendars, the epoch date is fixed, but in this calendar it's necessarily moveable. What implications for userland code (and for implementations?) arise from breaking those two otherwise invariants?

It also made me wonder if the gregory calendar should accept a julian era for input as well.

sffc commented 1 year ago

{ era: "bce", eraYear: -10 } should be equivalent to { era: "ce", eraYear: 9 }

The era code bce is being proposed as an alias to the canonical name pre-gregory

The julian-gregory calendar is not specified yet, and won't be specified in the initial release, but my initial reaction is that:

  1. The anchor era should be gregory
  2. The switchover date is a field specified in the calendar constructor, and I wouldn't be opposed to putting it in the calendar ID, like julian-gregory-16500101 for a hypothetical 1650-01-01 switchover date

I think there won't be julian because it is not in CLDR; there is coptic instead.

I gave every CLDR calendar gets an identity era. I included iso8601 because it was in the CLDR table, but we could say that iso8601 is the exception. Actually we could say that cyclic calendars are an exception, too.

Haven't thought about era code enumerations.

Did I leave any loose ends?

justingrant commented 1 year ago

The era code bce is being proposed as an alias to the canonical name pre-gregory

I empathize with the goal of consistency, but for a feature like eras that most developers will almost never use, it seems better to go with more familiar names for canonical eras. This will make code more self-describing so that developers won't have to open up MDN to figure out what's going on in code they're reading.

For example, imagine an educational web app that displays BCE dates in a different color so that they're not easily confused by its high-school-aged users. The following code is probably easy to understand for most programmers:

const date = originalDate.withCalendar('gregory');
const dateColor = date.era === 'bce' ? 'red' : 'black';

However the following code will probably require most developers to read the docs to figure out what's going on.

const date = originalDate.withCalendar('gregory');
const dateColor = date.era === 'before-gregory' ? 'red' : 'transparent';

Does 'before-gregory' mean "not Julian"? Something else? Is the year 400 AD "before-gregory" because it's before the Gregorian transition?

Here's a suggestion for another way to think about naming canonical era identifiers:

Canonical identifiers for eras should follow the following guidelines:

I gave every CLDR calendar gets an identity era. I included iso8601 because it was in the CLDR table, but we could say that iso8601 is the exception. Actually we could say that cyclic calendars are an exception, too.

This seems like a reasonable approach to me. It certainly makes things more consistent across calendars which seems like a good thing. If we are going to do this, then a normative PR in March seems like a good idea to leverage this consistency in the 262 spec. FYI @ptomato.

2. The switchover date is a field specified in the calendar constructor, and I wouldn't be opposed to putting it in the calendar ID, like julian-gregory-16500101 for a hypothetical 1650-01-01 switchover date

Would there be a fixed list of dates supported? If not, then how would enumeration of calendars work?

For a julian-gregory calendar, I'd strongly suggest it be fully spec-ed out before this proposal is finalized, because it seems to behave differently from all other CLDR calendars and runs the risk of breaking invariants we might rely on elsewhere.

I think there won't be julian because it is not in CLDR; there is coptic instead.

Given that we fairly frequently hear requests for a julian calendar, I wonder if julian should be an alias for coptic?

Manishearth commented 1 year ago

Would there be a fixed list of dates supported? If not, then how would enumeration of calendars work?

For a julian-gregory calendar, I'd strongly suggest it be fully spec-ed out before this proposal is finalized, because it seems to behave differently from all other CLDR calendars and runs the risk of breaking invariants we might rely on elsewhere.

I'm pretty sure the scheme here covers all potential designs of such a calendar. You can have a fully flexible julian-gregory calendar where the switchover is set at runtime or from locale data, and still work in this scheme where you basically choose to return a date in the julian or gregory era based on context. The only thing that changes is the canonical output era for a given date.

Given that we fairly frequently hear requests for a julian calendar, I wonder if julian should be an alias for coptic?

Absolutely not: the Julian calendar is completely different, and the Julian epoch is not the Coptic epoch. The calendars have the same period for the year, but they do not share a notion of months or era epochs, nor do the years start on the same day -- they are not fully synchronized/aligned

justingrant commented 1 year ago

Absolutely not: the Julian calendar is completely different, and the Julian epoch is not the Coptic epoch.

Makes sense; I'm unfamilar with Julian so didn't know. In that case then @sffc could you explain your comment above: "I think there won't be julian because it is not in CLDR; there is coptic instead." ? I assumed you meant that Julian was equivalent to Coptic, but now I'm not sure what you meant. :-)

I'm pretty sure the scheme here covers all potential designs of such a calendar.

I was thinking a bit more broadly: given that all other calendars have static identifiers and static eras, it'd be understandable that we'd assume those invariants. But this calendar (depending on its design) would break those assumptions. Seems like it'd make sense to do more work on the design of such a calendar in order to determine answers to questions like:

sffc commented 1 year ago

I meant that Julian is not widely used these days but Coptic is, and I think some people people who say they want Julian may actually want Coptic (no citation for that claim).

justingrant commented 1 year ago

Makes sense, thanks for clarifying. FWIW, in the issues and comments filed in the Temporal repo, the only currently-unsupported calendar that's come up a lot has been Julian. Obviously a non-random sample of GitHub-commenting calendar enthusiasts isn't enough to drive the roadmap, but it's a data point that suggests there may be interest in that calendar.

Another interesting thing: Java's GregorianCalendar class is actually a Julian/Gregorian hybrid:

GregorianCalendar is a hybrid calendar that supports both the Julian and Gregorian calendar systems with the support of a single discontinuity, which corresponds by default to the Gregorian date when the Gregorian calendar was instituted (October 15, 1582 in some countries, later in others). The cutover date may be changed by the caller by calling setGregorianChange().

Historically, in those countries which adopted the Gregorian calendar first, October 4, 1582 (Julian) was thus followed by October 15, 1582 (Gregorian). This calendar models this correctly. Before the Gregorian cutover, GregorianCalendar implements the Julian calendar. The only difference between the Gregorian and the Julian calendar is the leap year rule. The Julian calendar specifies leap years every four years, whereas the Gregorian calendar omits century years which are not divisible by 400.

GregorianCalendar implements proleptic Gregorian and Julian calendars. That is, dates are computed by extrapolating the current rules indefinitely far backward and forward in time. As a result, GregorianCalendar may be used for all years to generate meaningful and consistent results. However, dates obtained using GregorianCalendar are historically accurate only from March 1, 4 AD onward, when modern Julian calendar rules were adopted. Before this date, leap year rules were applied irregularly, and before 45 BC the Julian calendar did not even exist.

Prior to the institution of the Gregorian calendar, New Year's Day was March 25. To avoid confusion, this calendar always uses January 1. A manual adjustment may be made if desired for dates that are prior to the Gregorian changeover and which fall between January 1 and March 24.

sffc commented 1 year ago

@justingrant @Manishearth and I discussed the issues in this thread. We reached alignment on most issues, with the following changes to what was stated above:

justingrant commented 1 year ago

Good meeting. Thanks for taking the time.

  • ECMAScript could return different canonical era codes than those in CLDR, so long as they are aliases

I'll open a separate issue to add this as a special case for Gregorian.

sffc commented 1 year ago

Bikeshedding for the Proleptic Gregorian BCE era name:

I kind-of like ante-gregory. It's a more academic-sounding term (originating from Latin) that may be less likely to be misinterpreted to mean Julian than pre-gregory.

Manishearth commented 1 year ago

I'll express a strong preference against names like "backward-gregory" that focus on its direction: I think it's useful that the direction correlates with the name, but I do not think that is the most important thing about the era, and it will be confusing.

I support pre-gregory or ante-gregory, preference for pre-. Also fine with prev-gregory or gregory-prev

gregory-bce/roc-bce actually kind of makes sense since each calendar has a "common era", the problem is that "common era" is both a specific era ("Common Era", proper noun) and a generic one ("common era", noun-adjective phrase)

Manishearth commented 1 year ago

In a meeting between Shane, Justin, and I, we discussed this a bit:

For a julian-gregory calendar, I'd strongly suggest it be fully spec-ed out before this proposal is finalized, because it seems to behave differently from all other CLDR calendars and runs the risk of breaking invariants we might rely on elsewhere.

Basically, we do have a menu of potential designs for julian-gregory, that vary along a couple axes. So far none of the era designs are particularly incompatible with this.

Axis 1: One calendar or multiple?

We can essentially either have a single julian-gregory calendar that takes options on construction (explicit switchover date, perhaps guessed switchover date from locale, perhaps with some default), or a list of julian-gregory-norway etc calendars (alternatively named things like julian-gregory-02091752).

A tricky thing is that "switchover" is not necessarily a singular concept of needing just a pivot date here: Britaindid something a bit fancy here, since they considered years to start in March, where they had a multi-phase switchover moving the year numbering over first. However, it is typical to cite British Julian dates as backdated to January, and historical documents from this era tend to deal with "old style" and "new style" dates.

The julian-gregory scheme loses a useful property: previously, calendars only needed the code for identity. It is possible for a data-driven Islamic calendar to be different from one with the same code but loaded from different data, but that's not possible in Temporal or ICU (only in ICU4X's model, and we don't support the islamic calendars yet. We plan to consider this type of mismatch situation API misuse and have garbage-in-garbage-out behavior)

julian-gregory-foobar is more verbose but matches the way we do Islamic calendars.

Neither changes this era scheme too much, though in the julian-gregory-foobar scheme we may wish to add the julian-gregory and pre-julian-gregory era to all calendars, if we decide to have one (See below)

Axis 2: Produce a split era or combined era?

One design for a switchover calendar is to have it return dates in era julian for pre-switchover and era gregorian for post-switchover (and pre-julian for bce dates). This works fine and clearly communicates information about the switchover.

Another design is to have a single combined julian-gregory era (also aliased to ce/ad), which is just one such that there are a bunch of missing days. So dates will always be returned in julian-gregory or pre-julian-gregory (or ce/bce, or whatever, depending on decisions made here). We can still accept dates in julian or gregory, even if out-of-range (for out-of-range-of-era cases we've already decided that we should take reasonable fallback where possible, i.e. -10 BC in the Gregorian calendar will do the sensible thing, depending on error settings).

Since all output eras must be inputable, this brings up the question of what happens on input of julian-gregory. Listed as a separate axis since we may still decide to allow such input even if we produce a split era.

Axis 3: Allow input of magical combined era

If we decide to have a julian-gregory (or ce) "combined" era, we need to determine if we allow it on input, and what it's behavior will be.

We're able to do this because the switchover is from Julian to Gregorian, not vice versa, which means there are no overlapping dates, only a gap (except for British "old style" vs "new style" year reckoning, which we shouldn't handle in the base calendar anyway). So we can just look at dates, check if they're before or after the switchover, and set the internal era appropriately. If they're during the switchover, we can either return an error or perhaps clamp, though I'd like the default to be erroring here, since there's not a singular obvious choice for how the clamping should work. But we may want to clamp in overflow: constrain mode.

We probably will have to have the magical combined era as one of the listed eras so that julian-gregory is an era that makes sense in this scheme. I don't think that's a huge deal. There's sensible behavior for it.

Though @sffc if we want to return julian and gregory as canonical returned eras, we may lose the property that the calendar/pre-calendar eras are always the canonical ones.

sffc commented 1 year ago

Though @sffc if we want to return julian and gregory as canonical returned eras, we may lose the property that the calendar/pre-calendar eras are always the canonical ones.

There are now 2 places where it is harmful to name the anchor era the same as the calendar: japanese and julian-gregory. We could add an anchor="true" attribute instead.

Manishearth commented 1 year ago

In this case what do you mean by "anchor"?

I think we have a couple concepts floating around:

sffc commented 1 year ago

I see. The property that the canonical era is calendar/pre-calendar is only that calendar should be any calendar, not necessarily the current calendar. For example, ethioaa is a canonical era in the ethiopic calendar.

The "anchor era" is an additional property needed by Temporal for how to resolve inputs such as { calendar: "foo", year: 1234 }. It's the era to use when the implicit "arithmetic era" is needed.

Manishearth commented 1 year ago

Gotcha, what I thought.

sffc commented 1 year ago

@michaelficarra suggested adding Kanji aliases for Japanese era names in TC39-TG1. I opened an upstream CLDR issue to discuss further:

https://unicode-org.atlassian.net/browse/CLDR-16353

michaelficarra commented 1 year ago

@sffc There's also the precomposed characters for era names, like ㋿. I don't know which is a more appropriate alias, or if they are both equally valid.

justingrant commented 1 year ago

Given the complexity of the moveable switchover date, I wonder if julian-gregory would be better handled as a custom calendar rather than a built-in one? For example, I could imagine an npm package that includes all the logic, and users of that package would simply call a constructor with the switchover date as a parameter, and they'd get a custom calendar class that they could use. A package for custom Islamic calendars could work similarly.

This isn't ideal because string parsing like Temporal.PlainDate.from('2022-01-30[u-ca=julian-gregory-02091752]') wouldn't work. But that's a general problem for all custom calendars, which I expect will be solved by a "Custom Calendar Helper" polyfill that will patch Temporal to allow Temporal.*.from and other methods so they'll work with custom calendars.

BTW, the julian-gregory calendar is complicated enough, maybe it should be moved into its own issue for discussion?

The property that the canonical era is calendar/pre-calendar is only that calendar should be any calendar, not necessarily the current calendar. For example, ethioaa is a canonical era in the ethiopic calendar.

Good to re-use era names where they are shared between calendars (like the Ethiopian case) but I'm not sure I understand the benefit of "any calendar". Why would you want to use an ethioaa era in an islamic calendar?

There are now 2 places where it is harmful to name the anchor era the same as the calendar: japanese and julian-gregory.

By "anchor era" did you mean "canonical name of the anchor era"?

We could add an anchor="true" attribute instead.

Attribute in the CLDR data? Or somewhere else?

ljharb commented 1 year ago

I would hope that Temporal is never patched by anything except a 262 and/or 402 compliant polyfill - custom calendars shouldn’t be injected into globals.

justingrant commented 1 year ago

I would hope that Temporal is never patched by anything except a 262 and/or 402 compliant polyfill - custom calendars shouldn’t be injected into globals.

Given the limitations of the current API, if a custom calendar or timezone author wants to allow string parsing to work, e.g. Temporal.PlainDate.from('2020-01-01[u-ca=mycalendar') then there's no other way to support that without patching Temporal. This is a fallout from the change made a while ago to stop calling observable from when parsing string input.

I'm not saying that patching is a great idea, only that if you're a custom cal/tz author then I'm not sure how you can make your code act like a built-in timezone or calendar without patching.

Manishearth commented 1 year ago

BTW, the julian-gregory calendar is complicated enough, maybe it should be moved into its own issue for discussion?

I mean, the only reason we're talking about it here is that we wanted to make sure we're not closing the door to future designs (which we've established), we don't need to design it yet.