Open Manishearth opened 2 years ago
If we're not restricting ourselves to ICU4C behavior, then I believe Chinese and Dangi calendars should have a multi-era setup similar to the Japanese calendar (or atleast Chinese, I can understand some arguments against Dangi).
Does the Chinese calendar have a good definition of eras for that? I feel like an important difference is that an instance of the Chinese calendar will use a single era, but what that era is changes (similar to ethiopic vs ethioaa), whereas the Japanese calendar uses many eras at once.
CLDR doesn't have data for either presumably because the most common usage is to use cyclic + related ISO
(Either way, from ICU4X's upcoming 1.0, chinese/hebrew/dangi/islamic/roc don't matter)
Oh, another thing I'd considered was keeping bc
and ad
for gregory
and bce
and ce
for iso8601
(proleptic gregorian), but I don't have the strongest opinion there. What do you think?
For Ethiopic, would switching to pre-
instead of before-
to make it something like incarn
, pre-incarn
be better?
N.b. in ICU4X gregory
is proleptic gregorian, though we are considering having a calendar with a configurable switchover date (cc @sffc).
I had kept it bc/ad until @sffc mentioned that Temporal seemed to be moving in the direction of bce/ce. For now, we have bce/ce as codes, but we format to the "default" CLDR variants (BC/AD), and hope to introduce that configurability in the future. In general since Gregorian uses both I'd prefer to use the more "neutral"/"modern" one in the era code, which can't be changed in the future.
(I did toy with the idea of having both sets of era codes in Gregorian, and you can tweak which one gets used on the date; this would be similar to how our Ethiopic dates can produce either kind of era based on a flag)
I like incarn
, pre-incarn
.
The Gregorian era codes were discussed in https://github.com/unicode-org/icu4x/issues/470
In Temporal, iso8601
has no eras (or equivalently a singleton era); dates before year 0 are just rendered with a minus sign
For now, all we need is to decide on the era codes for the calendars we're shipping in ICU4X, which @Manishearth listed in the OP. We don't need to decide on chinese yet.
In general since Gregorian uses both I'd prefer to use the more "neutral"/"modern" one in the era code, which can't be changed in the future.
I'd err on that side too. Great, everything in the current list looks perfect to me.
I'll update the list with pre-incar then, and let's treat it as canonical for now?
@Manishearth SGTM
I think it would be nice if we could verify an appropriate Latin translation or transliteration of the Ethiopic era codes. CLDR has a few translations here:
I may also do some additional research.
I propose we always return undefined for era and eraYear for that but only use year for these calendars. I see no point to pass in "be" (and nothing else acceptable ) for Buddhist calendar nor any point to pass in "saka" (and nothing else acceptable) for India calendar. If there is only one acceptable value for a parameter, then there are no point to read that value. (as the same model as in "iso8601")
Can we first agree about this? I don't want to spend time to create era code for the calendar which has no need for it. I think spend spec text to define era code for these calendars are not productive so the only productive way to address that is to first decide not to define any era code for them.
If our choice for Q2 is A, then we need to define the era code in a way which make sense to all three of these calendar and cannot use "bc" or "bce" (since that will not be applicable for "roc" calendar)
If our choice for Q2 is B, then I would suggest the following: “gregory": "bce", "ce" "coptic": "bd", "ad" "roc": "bmg", "mg"
Can we first agree about this? I don't want to spend time to create era code for the calendar which has no need for it. I think spend spec text to define era code for these calendars are not productive so the only productive way to address that is to first decide not to define any era code for them.
I agree. We should avoid the bikeshed when possible.
Each has two era, and the year of these two era grow in opposite direction both without 0. The next question is should we
I suggest different sets of 2 era codes for each.
I propose we always return undefined for era and eraYear for that but only use year for these calendars.
I think this is fine, though we may wish to return an empty string (but accept undefined as well). At least from ICU4X's point of view the model is simpler when it's always stringy.
One note is that at least in ICU4X, the ethiopian calendar supports both era schemes simultaneously; so it will be a calendar that can accept "incar"
and "before-incar"
[^1] and also accept ""
(which we are currently calling "mundi"
), though the calendar object will only return one or the other based on how it's initialized.
B. For each of these three calendar, each define their own set of (2) era code?
I would define a pair of eras per-calendar (option B). The BCE/CE era codes can be reused for Japanese negative dates, though.
[^1]: In ICU4X we limit era codes to 16 characters. This does not need to extend to JS since we can always check against the extended code internally, but it is definitely a property that's convenient to have.
I think this is fine, though we may wish to return an empty string (but accept undefined as well). At least from ICU4X's point of view the model is simpler when it's always stringy.
Please read the following spec text in Temporal
https://tc39.es/proposal-temporal/#sec-temporal-calendardateera 15.6.1.6 CalendarDateEra ( calendar, date ) The abstract operation CalendarDateEra takes arguments calendar (a String) and date (a Temporal.PlainDateTime, Temporal.PlainDate, or Temporal.PlainYearMonth). It performs implementation-defined processing to find the era for the date corresponding to date in the context of the calendar represented by calendar and returns a lowercase String value representing that era, or undefined for calendars that do not have eras.
https://tc39.es/proposal-temporal/#sec-temporal-calendardateerayear 15.6.1.7 CalendarDateEraYear ( calendar, date ) The abstract operation CalendarDateEraYear takes arguments calendar (a String) and date (a Temporal.PlainDateTime, Temporal.PlainDate, or Temporal.PlainYearMonth). It performs implementation-defined processing to find the era for the date corresponding to date in the context of the calendar represented by calendar and returns an integer representing the ordinal position of the year of date in that era, or undefined for calendars that do not have eras.
https://tc39.es/proposal-temporal/#sec-temporal-calendardatefields 15.6.1.20 CalendarDateFields ( calendar, fields ) The abstract operation CalendarDateFields takes arguments calendar (a String) and fields (a List of Strings). It takes a list of standard fields in fields that are necessary for a given operation and returns a new list by adding relevant calendar-specific fields for the calendar represented by calendar. This is relevant for calendars which accept fields other than the standard set of built-in calendar fields.
https://tc39.es/proposal-temporal/#sec-temporal-calendardatemergefields 15.6.1.21 CalendarDateMergeFields ( calendar, fields, additionalFields ) The abstract operation CalendarDateMergeFields takes arguments calendar (a String), fields (a List of Strings), and additionalFields (a List of Strings). It takes two lists of calendar-specific fields for the calendar represented by calendar in fields and additionalFields and returns a new list that includes both sets of fields. The values in additionalFields should supersede the values in fields. Also, the returned field list must be free of ambiguity or conflicts. This is relevant for calendars which accept fields other than the standard set of built-in calendar fields.
These are part of the Temporal <-> Calendar communication protocol. If a calendar support era (and therefore eraYear), the CalendarDateFields will add "era" and "eraYear" to it's return list. For example, if the calendar is "gregory" and it need to support "bc" and "bce" then if the followin AO got called
CalendarDateFields("gregory", « "day", "month", "monthCode", "year" ») it should return « "day", "month", "monthCode", "year", "era", "eraYear" »
but a calendar does not have era and eraYear, then it should return undefined, but NOT an empty string, as how "iso8601" calendar behave in
Temporal.Calendar.prototype.era (see https://tc39.es/proposal-temporal/#sec-temporal.calendar.prototype.era ) and Temporal.Calendar.prototype.eraYear (see https://tc39.es/proposal-temporal/#sec-temporal.calendar.prototype.erayear)
Ah, I see. In ICU4X we may then use an empty string (or perhaps change things to use Option)
Ah, I see. In ICU4X we may then use an empty string (or perhaps change things to use Option)
I believe Option<String>
should be the best option here, if possible.
For "chinese" and "dangi" I propose we do NOT support any era but simply use year. (therefore, always return default for era and eraYear) This is because currently in ICU and CLDR there are no real era support for "chinese" and "dangi" calendar. The implementation just use the era and year field to put in "year in cycle" and "cycle sequence" which is really not the same as era but just a hacky way to support cycle year.
Regarding this, I'm not entirely sure. I see the prudence in your suggestion, but in my opinion the ICU4C implementations don't reflect the reality of these calendars. However, the process of finalizing the eras and era codes is a complicated research project so I'm personally split here as well.
I believe
Option<String>
should be the best option here, if possible.
Yeah, in general I would agree, though in this case I'm not convinced: it's a simpler model for the user to use empty eras since right now we only have an era+eraYear constructor. Once we have more constructors it makes sense for them to use an Option
I think.
To write down a proposal that came out of a session with myself @Manishearth and @FrankYFTang yesterday:
This proposal has the following advantages:
For multi-era calendars with well established era names, use them. This is Gregorian bce/ce, Ethiopic mundi/incar, ROC, and possibly others.
One addendum: well established era names for both eras, i.e. while coptic does have a couple names for the modern era, it does not have one for the pre-modern one. And we prefer shortforms if shortforms exist for both (so bc/bce works fine, but minguo stays minguo)
see a rough draft at https://frankyftang.github.io/proposal-intl-temporal/
still have problem with "ethiopic" calendar
I think Ethiopic can be modeled with two eras: "mundi" and "incar", or "ethioaa" and "ethiopic".
What about "mundi" and "ethiopic"?
BTW, where did you get the word "mundi" from? Source?
I saw "Anno Mundi" mentioned in https://en.wikipedia.org/wiki/Ethiopian_calendar#Anno_Mundi_according_to_Panodoros mention so should it be Anno Mundi ("anno-mundi") as the era code?
Yeah, we could do that!
Anno Mundi is redundant because "anno" basically (hand-waive) means "era" (like Anno Domini). I prefer "mundi".
I do not feel like constructing very complicated things for only a few calendars. Because, in fact, very few calendars have more than one era: Only japanese
and ethiopic
.
I'll write a special issue for the case of gregory
and of the Gregorian calendar as normally used in History in the European culture. In short, I can see no case in the real world where the proleptic gregorian calendar is used with backward year counting (i.e. 1-y
style, without 0
year).
I'll make also a special issue for ethiopic
, as there are sources for the names of eras.
IMHO, for all other calendars, no era code is necessary, since it would be always the same.
The value ""
for eraCode
should be accepted. Any other value should throw.
However, most of (if not all) those calendars with one single era have an era name associated to each of them, e.g. "Anno Mundi" for hebrew
, "Saka" for indian
, etc. This era name should be displayed by Intl.DateTimeFormat.format()
if (and only if) the proper era
and (later) eraDisplay
option is set accordingly.
iso8601
is a special case, there is no era and no era name should ever be displayed, if I understand the standard properly.
I think it's somewhat useful for developers to see dates having distinctly named era codes. Bear in mind that the codes are both for input and output.
Even if we ignore the 2-3 calendars with more than one era, I think it's still a useful thing to have for the other calendars; so I think it's worth putting thought into designing it well.
I don't consider the design sketched out in @sffc's comment to be too complicated. The end result will be one small table in the spec (with an additional description for the Japanese calendar, which we needed anyway). That design is somewhat complicated as a decisionmaking process but the end result will be quite straightforward.
As noted in the Ethiopic and Gregorian threads (#4 and #5), I'm warming up to the idea of dropping the "established era names" exception and just using the calendar ID everywhere.
For the Gregorian calendar with change date, it removes ambiguity about whether "ad"/"ce" and "bc"/"bce" refer to the Julian or Gregorian version of those dates.
It also means that { era: "xxx", eraYear: ###, monthCode: "xxx", day: ### }
uniquely identifies the date. The calendar system itself can be derived by looking at the era name (except for japanese).
A slight issue there, at least for ICU4X, will be that this can lead to longer era codes since calendar names can be longer than 16 bit. Not Temporal's problem, but worth noting.
I don't actually find the property of era codes uniquely identifying the calendar to be that useful. It seems nice, but I don't see much of a benefit, beyond potentially being able to omit calendar inputs in some cases if you want to be smart about things.
Also to maintain this property we need to make sure the next Japanese emperor isn't named Greg or Julian, which, while unlikely, is unfortunately not within the otherwise formidable powers of either Ecma International or the Unicode Consortium. :wink:
Or, well, I guess we can say that this property exists "except for Japanese" as you said but I find that less useful.
(And we have this problem with Japanese anyway)
Actually, I think the thing that I really dislike about that model is that it locks us into ISO-style negative years for all calendars, even ones which are used to 1-y
. @Louis-Aime has brought up some valid points about BCE years not actually mattering much in the proleptic Gregorian calendar, but if we get a switchover calendar (as we plan to), then it will be rather surprising to not allow BCE year input.
The 1 - y
thing that BCE dates do is rather confusing and easy to get wrong; it seems simple but it's easy to make fencepost errors with. Not giving users days in the format they expect seems like an easy footgun here, even when users know about this and try to handle it.
Not sure I follow? I'm only proposing that we use "gregory"
(and "julian"
and "buddhist"
and ...) for positive years, and continue with "pre-gregory"
(and "pre-julian"
and ...) if we need the 1 - y
thing.
Also to maintain this property we need to make sure the next Japanese emperor isn't named Greg or Julian, which, while unlikely, is unfortunately not within the otherwise formidable powers of either Ecma International or the Unicode Consortium.
The eras are Kanji transliterations, not emperor names. I don't think "gregory" or "julian" or any of the other non-Japanese calendar IDs are valid Kanji transliterations... well, maybe "dangi" is?
Let me try again to express why I think the "globally unique era code" is a nice property.
// A Julian date:
{ era: "julian", eraYear: 1234, monthCode: "M01", day: 1 }
{ calendar: "julian", era: "julian", eraYear: 1234, monthCode: "M01", day: 1 }
// A Gregorian date corresponding to the given Julian date
// (or a RangeError if not in AnyCalendar):
{ calendar: "gregory", era: "julian", eraYear: 1234, monthCode: "M01", day: 1 }
// A Japanese date corresponding to the given Buddhist date
// (or a RangeError if not in AnyCalendar):
{ calendar: "japanext", era: "buddhist", eraYear: 1234, monthCode: "M01", day: 1 }
I think this is less likely to fail than if the era codes were not globally unique, like
// A Julian date:
{ calendar: "julian", era: "ad", eraYear: 1234, monthCode: "M01", day: 1 }
// A Gregorian date, but different than the one above:
{ calendar: "gregory", era: "ad", eraYear: 1234, monthCode: "M01", day: 1 }
It seems like a nice property to have.
The eras are Kanji transliterations, not emperor names. I don't think "gregory" or "julian" or any of the other non-Japanese calendar IDs are valid Kanji transliterations... well, maybe "dangi" is?
Oh I was just making a joke 😄
Not sure I follow? I'm only proposing that we use
"gregory"
(and"julian"
and"buddhist"
and ...) for positive years, and continue with"pre-gregory"
(and"pre-julian"
and ...) if we need the1 - y
thing.
Ah! I hadn't picked up on that. That takes away my main worry.
// (or a RangeError if not in AnyCalendar):
Hmm, so since an AnyCalendar instance is a specific calendar, this is still an error, no? Unless you're suggesting we convert on construction but conversion might require instantiation of a second calendar. As far as ICU4X is concerned this doesn't really seem useful beyond us being able to show slightly better errors.
It seems like a nice property to have.
I agree, I just don't see it as being that useful, and I'm weighing it against the learnability benefits of just being able to use "ce"
and "bce"
.
Also note that due to TinyStr16, no matter what we cannot have this property in ICU4X, so from ICU4X's point of view we have to do something else, which is going to be confusing since we won't be able to just link to ECMA for it.
Also note that due to TinyStr16, no matter what we cannot have this property in ICU4X, so from ICU4X's point of view we have to do something else, which is going to be confusing since we won't be able to just link to ECMA for it.
Hm? I guess the longest possible with this model would be pre-islamic-umalqura
, but is that a real thing?
Also I'm not convinced for ICU4X's purposes that we should/need to use TinyStr16 in this context. Can discuss more later.
About the RangeError: If you are making a strongly typed Date<Buddhist>
, for example, but you pass era: "julian"
, we can permit a RangeError in that case. But, if you are using AnyCalendar, you have all the data anyway, so we can permit the mixed eras and calendars.
Hm? I guess the longest possible with this model would be
pre-islamic-umalqura
, but is that a real thing?
Oh, I guess islamic-umalqura
is short enough.
But, if you are using AnyCalendar, you have all the data anyway, so we can permit the mixed eras and calendars.
This isn't correct! An instance of AnyCalendar is a single calendar. I've considered building an AllCalendar, but we don't have that right now. I think implementations of ECMAScript would need AllCalendar anyway, though they would have the tools to build it themselves and I wasn't sure which side it makes the most sense to put that on.
This isn't correct! An instance of AnyCalendar is a single calendar.
A short-lived secondary calendar instance may need to be created in the construction phase, but the data is present.
The construction phase isn't when dates get constructed, though
Let me add some thoughts about the "globally unique era code", the nice and maybe useful property @sffc wishes.
In "real life", when an author provides an era indication (i.e. the era field that IntLDateTimeFormat.format() generates), this specify also the calendar context. If you read "Diocletian era" you are sure the author is using the coptic calendar. If you read a month name that is not of the coptic calendar in the same date expression, you will most probably presume an error from the author. A "globally unique era code" would make Temporal detect such an error.
One can also say that an era refers to its epoch date: ad
to 0000-12-30 (1 Jan. 1 in Julian), ce
to 0001-01-01 (if we admit the idea of counting years backward with the proleptic Gregorian calendar), am
of the hebrew
to -003760-09-07, aa
of ethiopic
to -005492-07-17 etc., so a "globally unique code" is desirable
On the other hand, I understand we want to address the case the calendars that count years backwards.
Let me first recall that, despite its global success, this method is only valid for the julian
calendar, whose present version was defined not only by Sosigenes and Julius Caesar in 45 B.C., but also by Dionysius Exiguus for the A.D. epoch and then by Bede the Venerable for the method of counting years backwards.
In ICUx, gregory
, iso8601
, coptic
and roc
also use this method.
Historically, coptic
is only the definition of a new era on the ancient Egyptian calendar that already existed, see Era of martyrs on Wikipedia. Dates before Diocletian era were numbered with respect to Roman consuls, to AUC or later to Anno Mundi, the same as ethiopaa
. Unless someone can give us examples of expert users counting years backwards from Diocletian era, the coptic
calendar should work with only one era, and no code is necessary.
The same holds for roc
, as you can read on its Wikipedia description. Years before 1912 were named after the corresponding Chinese emperor.
More discussion on gregory
and iso8601
may be read on Issue #5.
That said, it seems reasonable to specify a unique code for the first era of any calendar if and only if years are counted backwards from the calendar's first origin ("epoch"). In this case, I would recommend backwards
or bk
, which clearly expresses how years are counted in such an era. For any other era, I stick in favour of a globally unique code.
Possible variant: julian-bk
, gregory-bk
and so on. Then you could apply julian
or ad
to the japanese
calendar for the era from 0000-12-30 until 0645-01-03 (last date before taika
), and julian-bk
before that era. Let's see with Japanese historians how they work.
A "globally unique era code" would make Temporal detect such an error
I understand this; I'm not convinced that that property is super valuable. It almost feels redundant 99% of the time; users will see an API that says something like new Date("julian", "julian", ....)
and find the redundancy weird. Redundancy is great for error checking but 99% of the time this is redundancy where the strings are completely identical, and that just feels like it's over redundant. We could also add a third "calendar2" field where you repeat the calendar name and it would have a similar effect.
Let me first recall that, despite its global success, this method is only valid for the
julian
calendar
We've discussed this a bit in the other issue and I've pointed out that it's valid for the proleptic Gregorian calendar as well, as long as we provide a way of doing switchover calendars, which is indeed on the cards.
Historians are not the only users of these APIs. In fact I'm less inclined to optimize for historians over other users since historians know what they're doing and the design needs to try and guard them from bad choices far less, as compared to other users.
In ICUx,
gregory
,iso8601
,coptic
androc
also use this method.
I'm not sure what you mean, ICU4X does not support roc
yet. Also, everything in ICU4X can be considered a bit of a draft until the Temporal stuff is pinned down; don't make too many conclusions off of the implementation in ICU4X.
Also japanese
needs bce
eras as well (from discussion with the user community).
someone can give us examples of expert users counting years backwards from Diocletian era, the coptic calendar should work with only one era, and no code is necessary
Yeah here I agree that we should just be using negative years, though I don't necessarily agree with that meaning "no code".
In this case, I would recommend
backwards
orbk
, which clearly expresses how years are counted in such an era. For any other era, I stick in favour of a globally unique code
My worry is that this has far less intuitive grounding, as opposed to ce/bce as era codes. I do not want our users to have to be calendar experts to understand how to use this.
Especially users who simply want to do proleptic Gregorian dates, the standard in computing.
My worry is that this has far less intuitive grounding, as opposed to ce/bce as era codes. I do not want our users to have to be calendar experts to understand how to use this.
Is there any other upside to "bce" (as opposed to "pre-julian" or "julian-bk") other than being more intuitive?
I don't think so, but I think it's an important upside given that this is not just JS's "special calendars library", this is JS's next-gen general-purpose datetime library, and people using it from that angle will likely not expect to need to know much about eras.
People who don't know about eras can use the arithmetic year
instead of era + eraYear
.
Both Temporal and ICU4X have the concept of "era codes" for era names.
We have a spec for Japanese era codes, but we don't have anything for calendars with a fixed set of era codes.
It's not super important what they are, they should just make sense and be documented somewhere for people who wish to manually construct dates.
It would be nice to pin these down, and ideally it would be nice if they could be handled uniformly with ICU4X (which is gearing up for a 1.0 release soon).
Here's what ICU4X is doing:
{eraname}-{startyear}
for pre-meijiStrawman proposals for calendars we do not support yet:
The general set of rules I've followed here are that (applied in order):
We also truncate to 16 characters for our own reasons, but Temporal needn't do so.
I don't think it's important for there to be a consistent set of rules for making these era codes; the design of era codes is that they get picked once and documented somewhere and users should consult the documentation. It's nice to have such a set of rules to minimize bikeshedding though.
Thoughts? It would be nice to have consistency between ICU4X and Temporal here, and we can just pick some now (and add to the list as we go along).