Open sffc opened 1 year ago
The proposal in #9 addresses all of these points, except it is weak on P5. The era codes are intuitive but only if you know how era codes work in other places. Based on what I've heard, @justingrant feels P5 should be given more weight, with perhaps less weight on R2.
Thanks @sffc for sharing, and @manishearth for building the original doc. Very helpful.
Before sharing feedback on these requirements, I think it'd be helpful to share assumptions I'm making about era use that guide my feedback. Are these assumptions correct?
japanese
, gregorian
, coptic
, ethiopic
, and roc
. julian-gregory
is another possible one in the near future. We should optimize our era scheme for these calendars.
- Outside computing, calendars vary widely in use of eras. Most users of some calendars like Japanese and Gregorian know what era names mean. Users of other calendars like Chinese, Buddhist, Islamic, etc. don't use eras much, if at all.
Not sure if this is completely correct. For example, although Buddhist and Islamic calendars don't use eras internally, it's still common to use them in date formatting, especially when comparing them with dates in other calendars:
(both strings from Wikipedia)
- Reflecting this varying usage, most CLDR calendars have only a single era, which implies that eras are pretty much irrelevant in those calendars. We don't need to optimize our naming scheme for those calendars.
As noted above, irrelevant for calculations, but very much relevant for formatting and conversion.
- The only CLDR calendars that have multiple eras are:
japanese
,gregorian
,coptic
,ethiopic
, androc
.julian-gregory
is another possible one in the near future. We should optimize our era scheme for these calendars.
See above.
- Use of era codes in computing is very likely to be calendar-specific. Developers writing calendar-neutral code already have a robust set of non-era fields they can use, so they are unlikely to care about the specific codes of eras.
Mostly, I suppose, although I see use cases of era codes in calendar-neutral conversion between calendars.
- Users writing calendar-specific code are very likely to be familiar with the names of eras in their language.
I think I don't completely agree here:
- Both eras and calendars are confusing and/or unknown to many developers. So part of the responsibility in naming eras is to help developers figure out "what's an era, and how does it differ from a calendar?"
The more I think about this, the more I think that the line between an "era" and a "calendar" is a bit blurry. This is evidenced by the fact that I've seen people talk about dates "in the Buddhist era" in contrast with "in the Buddhist calendar". Even Gregorian really has only one era; it's the one that started the year Jesus Christ was born. Dates prior to that are "before" that era; we talk about it in CLDR as being a separate era, but conceptually it isn't really an "era".
The more I think about this, the more I think that the line between an "era" and a "calendar" is a bit blurry. This is evidenced by the fact that I've seen people talk about dates "in the Buddhist era" in contrast with "in the Buddhist calendar".
Yep, I think yours is an astute observation. For some calendars like buddhist
or indian
or any of the Islamic calendars, the era and the calendar are essentially interchangeable.
This also makes eras year
and eraYear
the same, so era
is always unnecessary to put in code that's specific to these calendars. Right?
The main risk I see in blurring the era vs. calendar distinction is that it creates an uncanny valley where it's hard to reason about why eras are sometimes treated like calendars and when they're not. For example, are the following property bags equivalent?
{ calendar: 'buddhist', year: 1, month: 1, day: 1 }
{ era: 'buddhist', eraYear: 1, month: 1, day: 1 } // are `month: 1, day: 1` ISO or Buddhist month/day?
Related: if canonical era names share the same ID as the calendar, then how do we expect the last line of code below to behave?
date = Temporal.PlainDate.from({ calendar: 'buddhist', year: 1, month: 1, day: 1 });
date.with({ calendar: 'gregory', era: 'ce', eraYear: 100 }); // Throws; can't change calendar in `with`
date.with({ era: 'gregory', eraYear: 100 }); // Throws? If not, then what calendar is the result?
Even Gregorian really has only one era; it's the one that started the year Jesus Christ was born. Dates prior to that are "before" that era; we talk about it in CLDR as being a separate era, but conceptually it isn't really an "era".
This makes conceptual sense, but I'm not sure how much that conceptual merger matters in computing where different code may sometimes be written for one era vs. another?
Not sure if this is completely correct. For example, although Buddhist and Islamic calendars don't use eras internally, it's still common to use them in date formatting, especially when comparing them with dates in other calendars:
Makes sense. Maybe a better way to reframe (1) is that the importance and usage of era names varies depending on which of three kinds of calendars are involved?
Some calendars don't really use eras outside computing. Chinese is an example. For these calendars, era naming is irrelevant because eras aren't used.
Some calendars with one era use that era in formatted dates and in common usage as a synonym for the calendar itself, e.g. "Buddhist Era". For these calendars, era names might help with discoverability. For example, seeing era: 'saka'
could clarify what an object with calendar: 'indian'
is doing. Other than discoverability, naming of eras doesn't matter for these calendars because all dates have the same era, and also because eraYear
and year
always match. Although eras may be used in formatting, era *names* in code are always unnecessary.
Some calendars use multiple eras, both in computing and non-computing usage, to differentiate periods of time and to count years during those periods. Japanese is the best example, but also Gregorian, Julian, ROC, and Coptic/Ethiopian. Era names for these calendars matter more than others, because (in addition to discoverability) programmers may need to use eras to create dates or to write era-specific code.
Is this three-segment split a good way to look at it?
use cases of era codes in calendar-neutral conversion between calendars.
What are examples of these cases? Are these cases the same as the case where users would use withCalendar
? If yes, why should someone prefer using eras instead of withCalendar
?
- Many users in, say, the Japanese calendar may know the eras in their local language, but not in English, which is the language that TC39 and CLDR have chosen to be the basis of string identifiers.
Makes sense, but for Japanese eras is there any other choice beside meiji
, heisei
, etc?
- English-speaking users from Gregorian countries may know "ad"/"bc" or "ce"/"bce" but maybe not both conventions.
I agree about ce/bce. I'd guess that this convention (which AFAIK is mostly used in academia and science) is probably ~50% recognized among Gregorian-using programmers. Which is of course a lot more than gregory-inverse
which will be unrecognized by everyone. :-)
The more I think about this, the more I think that the line between an "era" and a "calendar" is a bit blurry. This is evidenced by the fact that I've seen people talk about dates "in the Buddhist era" in contrast with "in the Buddhist calendar". Even Gregorian really has only one era; it's the one that started the year Jesus Christ was born. Dates prior to that are "before" that era; we talk about it in CLDR as being a separate era, but conceptually it isn't really an "era".
The mental model I have seen most commonly is that an era is primarily the point in time, not the system of reckoning.
People say "in the Buddhist era" because Buddhist has the same reckoning as the "default" Gregorian calendar. (But people don't say that for e.g. lunisolar Buddhist calendars)
The main risk I see in blurring the era vs. calendar distinction is that it creates an uncanny valley where it's hard to reason about why eras are sometimes treated like calendars and when they're not. For example, are the following property bags equivalent?
{ calendar: 'buddhist', year: 1, month: 1, day: 1 } { era: 'buddhist', eraYear: 1, month: 1, day: 1 } // are `month: 1, day: 1` ISO or Buddhist month/day?
Both of those refer to the same Epoch Day.
The second one doesn't specify what the output calendar should be.
Related: if canonical era names share the same ID as the calendar, then how do we expect the last line of code below to behave?
date = Temporal.PlainDate.from({ calendar: 'buddhist', year: 1, month: 1, day: 1 }); date.with({ calendar: 'gregory', era: 'ce', eraYear: 100 }); // Throws; can't change calendar in `with` date.with({ era: 'gregory', eraYear: 100 }); // Throws? If not, then what calendar is the result?
Throwing on the last line is valid and safe behavior for now. To avoid confusion, it seems reasonable to allow with({era})
to only work with eras in the current calendar.
Is this three-segment split a good way to look at it?
Maybe, but I would put Gregorian/Coptic/etc in the middle category. Chinese doesn't use eras at all, Japanese uses them all the time, and the rest use them in various situations when necessary.
use cases of era codes in calendar-neutral conversion between calendars.
What are examples of these cases? Are these cases the same as the case where users would use
withCalendar
? If yes, why should someone prefer using eras instead ofwithCalendar
?
I think it's useful when talking about dates in similar calendars. The example I gave in the CLDR issue was { calendar: "islamic-tbla", era: "islamic-umalqura", eraYear: ..., monthCode: ..., day: ... }
. This may be a useful way to express a date if you have it in the algorithmic Islamic calendar but want to express it in the tablular Islamic calendar.
- Many users in, say, the Japanese calendar may know the eras in their local language, but not in English, which is the language that TC39 and CLDR have chosen to be the basis of string identifiers.
Makes sense, but for Japanese eras is there any other choice beside
meiji
,heisei
, etc?
Unfortunately not because of identifier restrictions, but I very much want to add aliases for those eras in Kanji, etc.
- English-speaking users from Gregorian countries may know "ad"/"bc" or "ce"/"bce" but maybe not both conventions.
I agree about ce/bce. I'd guess that this convention (which AFAIK is mostly used in academia and science) is probably ~50% recognized among Gregorian-using programmers. Which is of course a lot more than
gregory-inverse
which will be unrecognized by everyone. :-)
It's not particularly intuitive, but it's intended to not be misleading. :smiley:
{ calendar: 'buddhist', year: 1, month: 1, day: 1 } { era: 'buddhist', eraYear: 1, month: 1, day: 1 } // are `month: 1, day: 1` ISO or Buddhist month/day?
Both of those refer to the same Epoch Day. The second one doesn't specify what the output calendar should be.
Whoops I forgot that buddhist
shared month/day reckoning with ISO for modern dates. What about this pair? Would you expect them to be equivalent?
Temporal.PlainDate.from({ calendar: 'indian', year: 1, month: 1, day: 1 });
Temporal.PlainDate.from({ era: 'saka', eraYear: 1, month: 1, day: 1 });
My assumption (before this conversation) was be that the second line should throw because saka
isn't an era in the ISO calendar... and for that matter *any* era should throw for the ISO calendar. Are you recommending that it should not throw?
Maybe, but I would put Gregorian/Coptic/etc in the middle category. Chinese doesn't use eras at all, Japanese uses them all the time, and the rest use them in various situations when necessary.
Hmm, then maybe there are four categories?
I think it's useful when talking about dates in similar calendars. The example I gave in the CLDR issue was
{ calendar: "islamic-tbla", era: "islamic-umalqura", eraYear: ..., monthCode: ..., day: ... }
. This may be a useful way to express a date if you have it in the algorithmic Islamic calendar but want to express it in the tablular Islamic calendar.
We already have a way to do this conversion:
Temporal.PlainDate.from({ calendar: "islamic-umalqura", year, monthCode, day }).withCalendar("islamic-tbla");
Is there something about the cross-calendar era case that would be easier for a programmer to understand vs. the status quo line of code above?
`What about this pair? Would you expect them to be equivalent?
Temporal.PlainDate.from({ calendar: 'indian', year: 1, month: 1, day: 1 }); Temporal.PlainDate.from({ era: 'saka', eraYear: 1, month: 1, day: 1 });
No, the first one specifies the calendar, and the second one does not. I am not suggesting that we "infer" the output calendar from the era code, only that we allow the era code as input when interpreting the eraYear, month, and day. For example, I could see a future where you would write
Temporal.PlainDate.from({ calendar: 'iso8601', era: 'indian', eraYear: 1, month: 1, day: 1 });
but note that "saka" wouldn't be accepted as it is not the canonical era name. You can only use aliases for calendars when they are within the same system.
Hmm, then maybe there are four categories?
- Calendars that don't use eras at all, e.g. Chinese
- Calendars that may use eras in formatting but there's no need to use them in code because there's only one era, e.g. Buddhist/Indian/Islamic
- Calendars that have multiple eras that are used occasionally, both in everyday usage and in code, e.g. Gregorian/Julian, Coptic/Ethiopian, ROC
- Calendars that use eras intensively (only Japanese)
I think we're getting closer. Let me try to rephrase:
Is there something about the cross-calendar era case that would be easier for a programmer to understand vs. the status quo line of code above?
Yes, I'm thinking beyond just Temporal here when I say that global era codes seem useful for conversion. For the purposes of Temporal, sure, this property of universal era codes isn't really necessary since the same goal can be achieved with other means.
For example, I could see a future where you would write
Temporal.PlainDate.from({ calendar: 'iso8601', era: 'indian', eraYear: 1, month: 1, day: 1 });
but note that "saka" wouldn't be accepted as it is not the canonical era name. You can only use aliases for calendars when they are within the same system.
Is there something about that code that's better than the currently-supported pattern below?
Temporal.PlainDate.from({ calendar: 'indian', year: 1, month: 1, day: 1 }).withCalendar('iso8601');
Temporal.PlainDate.from({ calendar: 'indian', era: 'saka', eraYear: 1, month: 1, day: 1 }).withCalendar('iso8601');
A few reasons that I think the latter pattern will be better for software reliability & teaching programmers about how to use calendars properly:
month
and day
will be evaluated inindian
can be used in all property bags while other eras e.g. saka
cannot (this is another way of saying that there's less need to learn about the concept of "canonical era") Yes, I'm thinking beyond just Temporal here when I say that global era codes seem useful for conversion. For the purposes of Temporal, sure, this property of universal era codes isn't really necessary since the same goal can be achieved with other means.
Oh! What are those "other means" conversion APIs beyond Temporal? Would those APIs not use calendars, only eras?
- Calendars that don't use eras at all (Chinese)
- Calendars that have a single era and are therefore often share an identity with the era (Buddhist, Indian, Islamic)
- Calendars that have a small, fixed number of eras, where the era is necessary to distinguish dates even if they are in the same calendar system, but the era can be elided when context is clear (Gregorian/Julian, Coptic/Ethiopian, ROC)
- Calendars that require eras to distinguish even modern dates from one another (Japanese)
This is a very clear explanation. Nice!
Is it a safe assumption that era ID naming should be optimized for cases (3) and (4), because (1) and (2) never need to use eras in code?
I don't disagree with you on the "indian"
example. Your code is much better than mine. My code was provided as an example of something that could potentially be considered in some other (non-ESTemporal) calendar code if the constraints aligned around it, and I don't want to unnecessarily rule it out.
Oh! What are those "other means" conversion APIs beyond Temporal? Would those APIs not use calendars, only eras?
For example, in Rust ICU4X, we allow the calendar to be a static parameter. We also support arbitrary runtime calendars, but only via (slower, bigger) dynamic dispatch. For calendars that are sufficiently similar, we may be able to share static type parameters, like Date<Islamic>
, with the era being subsequently used to perform runtime distinction.
Is it a safe assumption that era ID naming should be optimized for cases (3) and (4), because (1) and (2) never need to use eras in code?
Does the rest of my post address the case I'm trying to lay out that eras for case (2) are also useful in code?
Does the rest of my post address the case I'm trying to lay out that eras for case (2) are also useful in code?
For ECMAScript code, I think we should optimize the era solution for (3) and (4) where eras are actually needed in code. I especially would avoid solutions that make (3) and (4) worse—harder to understand/learn or more vulnerable to bugs—in order to make (1) or (2) better.
For non-ECMAScript code, I have no opinions about any of the above, as long as it doesn't make ECMAScript DX worse. I'm happy to support whatever your team wants to accomplish in Rust or elsewhere.
I also have no opinion about what are canonical era codes in CLDR, as long as those codes don't also have to be used as the values of era
properties in ECMAScript objects. Your idea of having a list of aliases and using the first one as the value of the era
property sounds fine to me. Or the canonical aliases could be listed in the spec. Or any other solution.
Related observation: all the behavior that I'm concerned about in your era plan is related to the CLDR canonical codes. Behavior of aliases in the plan seems much easier for ECMAScript developers to understand.
Aliases
ce
, ah
, be
, ...ah
in islamic variants, or bc
in gregory
vs Julian) can be used in multiple calendars, but the calendar is free to interpret the eras differently. Avoids DX complexity of different names for the same era in every related calendar. Canonical CLDR codes
withCalendar
.Would it be a reasonable path fwd to use your proposal's alias behavior (only) in ECMAScript, and then add a way to choose canonical aliases in ECMAScript?
It seems fine to use CLDR canonical codes as (single-calendar only) aliases in ECMAScript, as long as we avoid making them canonical for ES when there's a more recognizable alternative available.
Summarizing the feedback above: based on my current understanding of the proposal, here's two changes that I think would improve its usability for ECMAScript developers. Neither of these changes necessarily implies a change to CLDR keys nor to how Rust or other platforms would use era codes.
To make calendar features easier to understand and to reduce opportunities for bugs, only eras from the same calendar should be accepted as input. Code like the following, where eras are used with unrelated calendars, should throw. Instead, withCalendar
should be used.
Temporal.PlainDate.from('2020-01-01').with({ era: 'saka', eraYear: 100});
Temporal.PlainDate.from('2020-01-01[u-ca=chinese').with({ era: 'bc', eraYear: 100});
To improve discoverability and to make calendar features easier to understand, the era
property should return recognizable names like ce
, bce
or saka
instead of names that duplicate the calendar name like gregory
, gregory-inverse
, or indian
, respectively.
era
property should return the same name. Examples include ah
for Islamic calendar variants and the codes for AD/BC eras that could be used in Gregorian, Julian, and Japanese. Note that the same-named era may behave differently in different calendars.Based on my previous calendar research while building the Temporal polyfill, here's some suggestions for what I'd expect to see returned from era
in various calendars. Input-only aliases are noted. I think it'd be fine to use different codes as long as those names are reasonably widely recognized by users of those calendars.
buddhist
- be
chinese
/ dangi
- either chinese
/dangi
or don't support erascoptic
- am
, before-am
ethioaa
- aa
(alias: ahmete-alem
, mundi
)ethiopic
- incar
, aa
(alias: ahmete-alem
, mundi
)gregory
- ce
(alias: ad
), bce
(alias: bc
)hebrew
- am
indian
- saka
islamic*
- ah
japanese
- ce
(alias: ad
), bce
(alias: bc
), heisei
, meiji
, reiwa
, showa
, taisho
, with older eras accepted for input (per Manish's plan) but not for era
output.persian
- sh
(alias: ap
, solar-hijri
) Not 100% sure about this one; research needed. Wikipedia uses dates like "1401 SH".roc
- minguo
, before-roc
Also, it seems fine to for each calendar to use the name of the calendar as an input-only alias for the anchor era (the era that determines year
). So indian
could be an alias for saka
, as long as the latter one is what's returned from the era
getter.
Your proposal takes a reasonable position. However, I'll reiterate my concern, which is that it's never going to be clear what the correct "default aliases" should be.
For Gregorian, we'll never settle the ad/ce debate, and for most other calendars, we're mostly pulling these identifiers out of thin air; there's no clear right answer for almost any of them. ("saka" and "minguo" are the two that seem least controversial.) I'm comfortable making them aliases, because we can always add more aliases, but locking them in as default aliases means we can't change them without breaking the Web.
When we invented identifiers for measurement units, we referred to an independent governmental agency, NIST, to get the names and spellings of the units. But, there's no independent agency for the era names. CLDR is the independent agency. It could certainly choose to give these things names on its own, but that's a profound step to take.
I'll respond below to your note around era names, but first: are you OK with the cross-calendar limitation in (1) above? It'd be nice to get consensus on that part, if possible.
there's no independent agency for the era names.
If we want to rely on an existing standard, then Java has standardized era codes for Japanese, Gregorian, Islamic, Buddhist, and Minguo. Java has been using these names since Java 1.8 (aka Java 8) in 2014, so any Java or Android developer working in those calendars is likely to be familiar. Can we rely on Java's prior art?
For Gregorian, we'll never settle the ad/ce debate
IMO I don't think we need to settle any debates... there just needs to be some reasonable basis to pick something that many/most people will recognize. Especially when we can point to some external authority's prior art like Java.
for most other calendars, we're mostly pulling these identifiers out of thin air; there's no clear right answer for almost any of them ... locking them in as default aliases means we can't change them without breaking the Web.
If that's a big concern, then should we just avoid the problem completely for many calendars by removing eras from calendars that don't use multiple eras? Then we'd limit any potential controversy or backwards-compat issues to only 5 calendars: japanese
, gregory
, roc
, coptic
, and ethiopic
.
Of those, the first three are already covered by Java's precedent, leaving only coptic
and ethiopic
.
For coptic
, A.M.
seems very widely used and uncontroversial in all the English sources that I could find. There's a backwards era too, which we could either use the "-inverse" suffix you've been discussing or a "before-" prefix to follow Java's lead with the ROC calendar.
For ethiopic
, aa
is already used inside the name of the ethioaa
calendar, so that name seems like a safe choice, because we can't change the name of the calendar without breaking the web.
That leaves only one era remaining: ethiopic
's anchor era which seems to be called the "Incarnation Era" universally in all English sources. Seems like the only decision is how to abbreviate it (if at all), not what to call it.
I'm not suggesting that any of these solutions are perfect, only that using not-universally-recognizable names (many of which are already used in Java) seems better for developers than inventing some new scheme ourselves.
I'll respond below to your note around era names, but first: are you OK with the cross-calendar limitation in (1) above? It'd be nice to get consensus on that part, if possible.
I will rephrase this and then agree with my rephrased version. I am okay if ECMA-402 chooses to only accept calendar-scoped era codes and aliases upon input and not eras from other calendars. That would be a choice by ECMA-402 on how to interpret the data it gets from CLDR.
May I ask how you would extend your model to support Julian and Juliogregory?
May I ask how you would extend your model to support Julian and Juliogregory?
Julian seems straightforward: use the same ce
/bce
eras as gregory
.
For a Julian/Gregory calendar, the answer kinda depends on the design of the calendar, specifically how (whether?) the calendar lets users discover whether a particular Temporal.PlainDate
instance using that calendar is a Julian or Gregorian date. The most discoverable way would probably be to use the era
to distinguish between Julian or Gregorian. A custom property could be used too. Or an additional prototype method on this calendar only. There's probably others.
If we don't offer this capability at all, or if we want to use a separate method or getter like afterGregorianTransition
to distinguish Julian vs. Gregorian, then the era could be ce
to align with gregory
and julian
.
If we want to use era
to distinguish Julian vs. Gregorian, then the era
getter could return bce
, julian
or gregory
. IMO this would be a more discoverable option.
@Manishearth wrote a document focusing on pre-modern eras. I wanted to pull out the list of requirements that have driven the era code design proposal in #9: