tc39 / proposal-intl-era-monthcode

To specify necessary details about era, eraYear and monthCode usage with Temporal in internationalization setting (for calendars other than "iso8601").
https://tc39.github.io/proposal-intl-era-monthcode
MIT License
3 stars 3 forks source link

List of requirements for era code design #13

Open sffc opened 1 year ago

sffc commented 1 year ago

@Manishearth wrote a document focusing on pre-modern eras. I wanted to pull out the list of requirements that have driven the era code design proposal in #9:

sffc commented 1 year ago

The proposal in #9 addresses all of these points, except it is weak on P5. The era codes are intuitive but only if you know how era codes work in other places. Based on what I've heard, @justingrant feels P5 should be given more weight, with perhaps less weight on R2.

justingrant commented 1 year ago

Thanks @sffc for sharing, and @manishearth for building the original doc. Very helpful.

Before sharing feedback on these requirements, I think it'd be helpful to share assumptions I'm making about era use that guide my feedback. Are these assumptions correct?

  1. Outside computing, calendars vary widely in use of eras. Most users of some calendars like Japanese and Gregorian know what era names mean. Users of other calendars like Chinese, Buddhist, Islamic, etc. don't use eras much, if at all.
  2. Reflecting this varying usage, most CLDR calendars have only a single era, which implies that eras are pretty much irrelevant in those calendars. We don't need to optimize our naming scheme for those calendars.
  3. The only CLDR calendars that have multiple eras are: japanese, gregorian, coptic, ethiopic, and roc. julian-gregory is another possible one in the near future. We should optimize our era scheme for these calendars.
  4. Use of era codes in computing is very likely to be calendar-specific. Developers writing calendar-neutral code already have a robust set of non-era fields they can use, so they are unlikely to care about the specific codes of eras.
  5. Users writing calendar-specific code are very likely to be familiar with the names of eras in their language.
  6. Both eras and calendars are confusing and/or unknown to many developers. So part of the responsibility in naming eras is to help developers figure out "what's an era, and how does it differ from a calendar?"
sffc commented 1 year ago
  1. Outside computing, calendars vary widely in use of eras. Most users of some calendars like Japanese and Gregorian know what era names mean. Users of other calendars like Chinese, Buddhist, Islamic, etc. don't use eras much, if at all.

Not sure if this is completely correct. For example, although Buddhist and Islamic calendars don't use eras internally, it's still common to use them in date formatting, especially when comparing them with dates in other calendars:

(both strings from Wikipedia)

  1. Reflecting this varying usage, most CLDR calendars have only a single era, which implies that eras are pretty much irrelevant in those calendars. We don't need to optimize our naming scheme for those calendars.

As noted above, irrelevant for calculations, but very much relevant for formatting and conversion.

  1. The only CLDR calendars that have multiple eras are: japanese, gregorian, coptic, ethiopic, and roc. julian-gregory is another possible one in the near future. We should optimize our era scheme for these calendars.

See above.

  1. Use of era codes in computing is very likely to be calendar-specific. Developers writing calendar-neutral code already have a robust set of non-era fields they can use, so they are unlikely to care about the specific codes of eras.

Mostly, I suppose, although I see use cases of era codes in calendar-neutral conversion between calendars.

  1. Users writing calendar-specific code are very likely to be familiar with the names of eras in their language.

I think I don't completely agree here:

  1. Both eras and calendars are confusing and/or unknown to many developers. So part of the responsibility in naming eras is to help developers figure out "what's an era, and how does it differ from a calendar?"

The more I think about this, the more I think that the line between an "era" and a "calendar" is a bit blurry. This is evidenced by the fact that I've seen people talk about dates "in the Buddhist era" in contrast with "in the Buddhist calendar". Even Gregorian really has only one era; it's the one that started the year Jesus Christ was born. Dates prior to that are "before" that era; we talk about it in CLDR as being a separate era, but conceptually it isn't really an "era".

justingrant commented 1 year ago

The more I think about this, the more I think that the line between an "era" and a "calendar" is a bit blurry. This is evidenced by the fact that I've seen people talk about dates "in the Buddhist era" in contrast with "in the Buddhist calendar".

Yep, I think yours is an astute observation. For some calendars like buddhist or indian or any of the Islamic calendars, the era and the calendar are essentially interchangeable.

This also makes eras year and eraYear the same, so era is always unnecessary to put in code that's specific to these calendars. Right?

The main risk I see in blurring the era vs. calendar distinction is that it creates an uncanny valley where it's hard to reason about why eras are sometimes treated like calendars and when they're not. For example, are the following property bags equivalent?

{ calendar: 'buddhist', year: 1, month: 1, day: 1 }
{ era: 'buddhist', eraYear: 1, month: 1, day: 1 } // are `month: 1, day: 1` ISO or Buddhist month/day?

Related: if canonical era names share the same ID as the calendar, then how do we expect the last line of code below to behave?

date = Temporal.PlainDate.from({ calendar: 'buddhist', year: 1, month: 1, day: 1 });
date.with({ calendar: 'gregory', era: 'ce', eraYear: 100 }); // Throws; can't change calendar in `with`
date.with({ era: 'gregory', eraYear: 100 }); // Throws? If not, then what calendar is the result?

Even Gregorian really has only one era; it's the one that started the year Jesus Christ was born. Dates prior to that are "before" that era; we talk about it in CLDR as being a separate era, but conceptually it isn't really an "era".

This makes conceptual sense, but I'm not sure how much that conceptual merger matters in computing where different code may sometimes be written for one era vs. another?

Not sure if this is completely correct. For example, although Buddhist and Islamic calendars don't use eras internally, it's still common to use them in date formatting, especially when comparing them with dates in other calendars:

Makes sense. Maybe a better way to reframe (1) is that the importance and usage of era names varies depending on which of three kinds of calendars are involved?

Some calendars don't really use eras outside computing. Chinese is an example. For these calendars, era naming is irrelevant because eras aren't used.

Some calendars with one era use that era in formatted dates and in common usage as a synonym for the calendar itself, e.g. "Buddhist Era". For these calendars, era names might help with discoverability. For example, seeing era: 'saka' could clarify what an object with calendar: 'indian' is doing. Other than discoverability, naming of eras doesn't matter for these calendars because all dates have the same era, and also because eraYear and year always match. Although eras may be used in formatting, era *names* in code are always unnecessary.

Some calendars use multiple eras, both in computing and non-computing usage, to differentiate periods of time and to count years during those periods. Japanese is the best example, but also Gregorian, Julian, ROC, and Coptic/Ethiopian. Era names for these calendars matter more than others, because (in addition to discoverability) programmers may need to use eras to create dates or to write era-specific code.

Is this three-segment split a good way to look at it?

use cases of era codes in calendar-neutral conversion between calendars.

What are examples of these cases? Are these cases the same as the case where users would use withCalendar? If yes, why should someone prefer using eras instead of withCalendar?

  • Many users in, say, the Japanese calendar may know the eras in their local language, but not in English, which is the language that TC39 and CLDR have chosen to be the basis of string identifiers.

Makes sense, but for Japanese eras is there any other choice beside meiji, heisei, etc?

  • English-speaking users from Gregorian countries may know "ad"/"bc" or "ce"/"bce" but maybe not both conventions.

I agree about ce/bce. I'd guess that this convention (which AFAIK is mostly used in academia and science) is probably ~50% recognized among Gregorian-using programmers. Which is of course a lot more than gregory-inverse which will be unrecognized by everyone. :-)

Manishearth commented 1 year ago

The more I think about this, the more I think that the line between an "era" and a "calendar" is a bit blurry. This is evidenced by the fact that I've seen people talk about dates "in the Buddhist era" in contrast with "in the Buddhist calendar". Even Gregorian really has only one era; it's the one that started the year Jesus Christ was born. Dates prior to that are "before" that era; we talk about it in CLDR as being a separate era, but conceptually it isn't really an "era".

The mental model I have seen most commonly is that an era is primarily the point in time, not the system of reckoning.

People say "in the Buddhist era" because Buddhist has the same reckoning as the "default" Gregorian calendar. (But people don't say that for e.g. lunisolar Buddhist calendars)

sffc commented 1 year ago

The main risk I see in blurring the era vs. calendar distinction is that it creates an uncanny valley where it's hard to reason about why eras are sometimes treated like calendars and when they're not. For example, are the following property bags equivalent?

{ calendar: 'buddhist', year: 1, month: 1, day: 1 }
{ era: 'buddhist', eraYear: 1, month: 1, day: 1 } // are `month: 1, day: 1` ISO or Buddhist month/day?

Both of those refer to the same Epoch Day.

The second one doesn't specify what the output calendar should be.

Related: if canonical era names share the same ID as the calendar, then how do we expect the last line of code below to behave?

date = Temporal.PlainDate.from({ calendar: 'buddhist', year: 1, month: 1, day: 1 });
date.with({ calendar: 'gregory', era: 'ce', eraYear: 100 }); // Throws; can't change calendar in `with`
date.with({ era: 'gregory', eraYear: 100 }); // Throws? If not, then what calendar is the result?

Throwing on the last line is valid and safe behavior for now. To avoid confusion, it seems reasonable to allow with({era}) to only work with eras in the current calendar.

Is this three-segment split a good way to look at it?

Maybe, but I would put Gregorian/Coptic/etc in the middle category. Chinese doesn't use eras at all, Japanese uses them all the time, and the rest use them in various situations when necessary.

use cases of era codes in calendar-neutral conversion between calendars.

What are examples of these cases? Are these cases the same as the case where users would use withCalendar? If yes, why should someone prefer using eras instead of withCalendar?

I think it's useful when talking about dates in similar calendars. The example I gave in the CLDR issue was { calendar: "islamic-tbla", era: "islamic-umalqura", eraYear: ..., monthCode: ..., day: ... }. This may be a useful way to express a date if you have it in the algorithmic Islamic calendar but want to express it in the tablular Islamic calendar.

  • Many users in, say, the Japanese calendar may know the eras in their local language, but not in English, which is the language that TC39 and CLDR have chosen to be the basis of string identifiers.

Makes sense, but for Japanese eras is there any other choice beside meiji, heisei, etc?

Unfortunately not because of identifier restrictions, but I very much want to add aliases for those eras in Kanji, etc.

  • English-speaking users from Gregorian countries may know "ad"/"bc" or "ce"/"bce" but maybe not both conventions.

I agree about ce/bce. I'd guess that this convention (which AFAIK is mostly used in academia and science) is probably ~50% recognized among Gregorian-using programmers. Which is of course a lot more than gregory-inverse which will be unrecognized by everyone. :-)

It's not particularly intuitive, but it's intended to not be misleading. :smiley:

justingrant commented 1 year ago
{ calendar: 'buddhist', year: 1, month: 1, day: 1 }
{ era: 'buddhist', eraYear: 1, month: 1, day: 1 } // are `month: 1, day: 1` ISO or Buddhist month/day?

Both of those refer to the same Epoch Day. The second one doesn't specify what the output calendar should be.

Whoops I forgot that buddhist shared month/day reckoning with ISO for modern dates. What about this pair? Would you expect them to be equivalent?

Temporal.PlainDate.from({ calendar: 'indian', year: 1, month: 1, day: 1 });
Temporal.PlainDate.from({ era: 'saka', eraYear: 1, month: 1, day: 1 }); 

My assumption (before this conversation) was be that the second line should throw because saka isn't an era in the ISO calendar... and for that matter *any* era should throw for the ISO calendar. Are you recommending that it should not throw?

Maybe, but I would put Gregorian/Coptic/etc in the middle category. Chinese doesn't use eras at all, Japanese uses them all the time, and the rest use them in various situations when necessary.

Hmm, then maybe there are four categories?

I think it's useful when talking about dates in similar calendars. The example I gave in the CLDR issue was { calendar: "islamic-tbla", era: "islamic-umalqura", eraYear: ..., monthCode: ..., day: ... }. This may be a useful way to express a date if you have it in the algorithmic Islamic calendar but want to express it in the tablular Islamic calendar.

We already have a way to do this conversion:

Temporal.PlainDate.from({ calendar: "islamic-umalqura", year, monthCode, day }).withCalendar("islamic-tbla");

Is there something about the cross-calendar era case that would be easier for a programmer to understand vs. the status quo line of code above?

sffc commented 1 year ago

`What about this pair? Would you expect them to be equivalent?

Temporal.PlainDate.from({ calendar: 'indian', year: 1, month: 1, day: 1 });
Temporal.PlainDate.from({ era: 'saka', eraYear: 1, month: 1, day: 1 }); 

No, the first one specifies the calendar, and the second one does not. I am not suggesting that we "infer" the output calendar from the era code, only that we allow the era code as input when interpreting the eraYear, month, and day. For example, I could see a future where you would write

Temporal.PlainDate.from({ calendar: 'iso8601', era: 'indian', eraYear: 1, month: 1, day: 1 });

but note that "saka" wouldn't be accepted as it is not the canonical era name. You can only use aliases for calendars when they are within the same system.

Hmm, then maybe there are four categories?

  • Calendars that don't use eras at all, e.g. Chinese
  • Calendars that may use eras in formatting but there's no need to use them in code because there's only one era, e.g. Buddhist/Indian/Islamic
  • Calendars that have multiple eras that are used occasionally, both in everyday usage and in code, e.g. Gregorian/Julian, Coptic/Ethiopian, ROC
  • Calendars that use eras intensively (only Japanese)

I think we're getting closer. Let me try to rephrase:

  1. Calendars that don't use eras at all (Chinese)
  2. Calendars that have a single era and are therefore often share an identity with the era (Buddhist, Indian, Islamic)
  3. Calendars that have a small, fixed number of eras, where the era is necessary to distinguish dates even if they are in the same calendar system, but the era can be elided when context is clear (Gregorian/Julian, Coptic/Ethiopian, ROC)
  4. Calendars that require eras to distinguish even modern dates from one another (Japanese)

Is there something about the cross-calendar era case that would be easier for a programmer to understand vs. the status quo line of code above?

Yes, I'm thinking beyond just Temporal here when I say that global era codes seem useful for conversion. For the purposes of Temporal, sure, this property of universal era codes isn't really necessary since the same goal can be achieved with other means.

justingrant commented 1 year ago

For example, I could see a future where you would write

Temporal.PlainDate.from({ calendar: 'iso8601', era: 'indian', eraYear: 1, month: 1, day: 1 });

but note that "saka" wouldn't be accepted as it is not the canonical era name. You can only use aliases for calendars when they are within the same system.

Is there something about that code that's better than the currently-supported pattern below?

Temporal.PlainDate.from({ calendar: 'indian', year: 1, month: 1, day: 1 }).withCalendar('iso8601');
Temporal.PlainDate.from({ calendar: 'indian', era: 'saka', eraYear: 1, month: 1, day: 1 }).withCalendar('iso8601');

A few reasons that I think the latter pattern will be better for software reliability & teaching programmers about how to use calendars properly:

Yes, I'm thinking beyond just Temporal here when I say that global era codes seem useful for conversion. For the purposes of Temporal, sure, this property of universal era codes isn't really necessary since the same goal can be achieved with other means.

Oh! What are those "other means" conversion APIs beyond Temporal? Would those APIs not use calendars, only eras?

  • Calendars that don't use eras at all (Chinese)
  • Calendars that have a single era and are therefore often share an identity with the era (Buddhist, Indian, Islamic)
  • Calendars that have a small, fixed number of eras, where the era is necessary to distinguish dates even if they are in the same calendar system, but the era can be elided when context is clear (Gregorian/Julian, Coptic/Ethiopian, ROC)
  • Calendars that require eras to distinguish even modern dates from one another (Japanese)

This is a very clear explanation. Nice!

Is it a safe assumption that era ID naming should be optimized for cases (3) and (4), because (1) and (2) never need to use eras in code?

sffc commented 1 year ago

I don't disagree with you on the "indian" example. Your code is much better than mine. My code was provided as an example of something that could potentially be considered in some other (non-ESTemporal) calendar code if the constraints aligned around it, and I don't want to unnecessarily rule it out.

Oh! What are those "other means" conversion APIs beyond Temporal? Would those APIs not use calendars, only eras?

For example, in Rust ICU4X, we allow the calendar to be a static parameter. We also support arbitrary runtime calendars, but only via (slower, bigger) dynamic dispatch. For calendars that are sufficiently similar, we may be able to share static type parameters, like Date<Islamic>, with the era being subsequently used to perform runtime distinction.

Is it a safe assumption that era ID naming should be optimized for cases (3) and (4), because (1) and (2) never need to use eras in code?

Does the rest of my post address the case I'm trying to lay out that eras for case (2) are also useful in code?

justingrant commented 1 year ago

Does the rest of my post address the case I'm trying to lay out that eras for case (2) are also useful in code?

For ECMAScript code, I think we should optimize the era solution for (3) and (4) where eras are actually needed in code. I especially would avoid solutions that make (3) and (4) worse—harder to understand/learn or more vulnerable to bugs—in order to make (1) or (2) better.

For non-ECMAScript code, I have no opinions about any of the above, as long as it doesn't make ECMAScript DX worse. I'm happy to support whatever your team wants to accomplish in Rust or elsewhere.

I also have no opinion about what are canonical era codes in CLDR, as long as those codes don't also have to be used as the values of era properties in ECMAScript objects. Your idea of having a list of aliases and using the first one as the value of the era property sounds fine to me. Or the canonical aliases could be listed in the spec. Or any other solution.

Related observation: all the behavior that I'm concerned about in your era plan is related to the CLDR canonical codes. Behavior of aliases in the plan seems much easier for ECMAScript developers to understand.

Aliases

Canonical CLDR codes

Would it be a reasonable path fwd to use your proposal's alias behavior (only) in ECMAScript, and then add a way to choose canonical aliases in ECMAScript?

It seems fine to use CLDR canonical codes as (single-calendar only) aliases in ECMAScript, as long as we avoid making them canonical for ES when there's a more recognizable alternative available.

justingrant commented 1 year ago

Summarizing the feedback above: based on my current understanding of the proposal, here's two changes that I think would improve its usability for ECMAScript developers. Neither of these changes necessarily implies a change to CLDR keys nor to how Rust or other platforms would use era codes.

  1. To make calendar features easier to understand and to reduce opportunities for bugs, only eras from the same calendar should be accepted as input. Code like the following, where eras are used with unrelated calendars, should throw. Instead, withCalendar should be used.

    Temporal.PlainDate.from('2020-01-01').with({ era: 'saka', eraYear: 100});
    Temporal.PlainDate.from('2020-01-01[u-ca=chinese').with({ era: 'bc', eraYear: 100});
  2. To improve discoverability and to make calendar features easier to understand, the era property should return recognizable names like ce, bce or saka instead of names that duplicate the calendar name like gregory, gregory-inverse, or indian, respectively.

    • If multiple names are widely recognized, it's OK to pick one and use the others as input-only aliases.
    • If real-world users of related calendars use the same name for an era, then those calendars era property should return the same name. Examples include ah for Islamic calendar variants and the codes for AD/BC eras that could be used in Gregorian, Julian, and Japanese. Note that the same-named era may behave differently in different calendars.

Based on my previous calendar research while building the Temporal polyfill, here's some suggestions for what I'd expect to see returned from era in various calendars. Input-only aliases are noted. I think it'd be fine to use different codes as long as those names are reasonably widely recognized by users of those calendars.

Also, it seems fine to for each calendar to use the name of the calendar as an input-only alias for the anchor era (the era that determines year). So indian could be an alias for saka, as long as the latter one is what's returned from the era getter.

sffc commented 1 year ago

Your proposal takes a reasonable position. However, I'll reiterate my concern, which is that it's never going to be clear what the correct "default aliases" should be.

For Gregorian, we'll never settle the ad/ce debate, and for most other calendars, we're mostly pulling these identifiers out of thin air; there's no clear right answer for almost any of them. ("saka" and "minguo" are the two that seem least controversial.) I'm comfortable making them aliases, because we can always add more aliases, but locking them in as default aliases means we can't change them without breaking the Web.

When we invented identifiers for measurement units, we referred to an independent governmental agency, NIST, to get the names and spellings of the units. But, there's no independent agency for the era names. CLDR is the independent agency. It could certainly choose to give these things names on its own, but that's a profound step to take.

justingrant commented 1 year ago

I'll respond below to your note around era names, but first: are you OK with the cross-calendar limitation in (1) above? It'd be nice to get consensus on that part, if possible.

there's no independent agency for the era names.

If we want to rely on an existing standard, then Java has standardized era codes for Japanese, Gregorian, Islamic, Buddhist, and Minguo. Java has been using these names since Java 1.8 (aka Java 8) in 2014, so any Java or Android developer working in those calendars is likely to be familiar. Can we rely on Java's prior art?

For Gregorian, we'll never settle the ad/ce debate

IMO I don't think we need to settle any debates... there just needs to be some reasonable basis to pick something that many/most people will recognize. Especially when we can point to some external authority's prior art like Java.

for most other calendars, we're mostly pulling these identifiers out of thin air; there's no clear right answer for almost any of them ... locking them in as default aliases means we can't change them without breaking the Web.

If that's a big concern, then should we just avoid the problem completely for many calendars by removing eras from calendars that don't use multiple eras? Then we'd limit any potential controversy or backwards-compat issues to only 5 calendars: japanese, gregory, roc, coptic, and ethiopic.

Of those, the first three are already covered by Java's precedent, leaving only coptic and ethiopic.

For coptic, A.M. seems very widely used and uncontroversial in all the English sources that I could find. There's a backwards era too, which we could either use the "-inverse" suffix you've been discussing or a "before-" prefix to follow Java's lead with the ROC calendar.

For ethiopic, aa is already used inside the name of the ethioaa calendar, so that name seems like a safe choice, because we can't change the name of the calendar without breaking the web.

That leaves only one era remaining: ethiopic's anchor era which seems to be called the "Incarnation Era" universally in all English sources. Seems like the only decision is how to abbreviate it (if at all), not what to call it.

I'm not suggesting that any of these solutions are perfect, only that using not-universally-recognizable names (many of which are already used in Java) seems better for developers than inventing some new scheme ourselves.

sffc commented 1 year ago

I'll respond below to your note around era names, but first: are you OK with the cross-calendar limitation in (1) above? It'd be nice to get consensus on that part, if possible.

I will rephrase this and then agree with my rephrased version. I am okay if ECMA-402 chooses to only accept calendar-scoped era codes and aliases upon input and not eras from other calendars. That would be a choice by ECMA-402 on how to interpret the data it gets from CLDR.


May I ask how you would extend your model to support Julian and Juliogregory?

justingrant commented 1 year ago

May I ask how you would extend your model to support Julian and Juliogregory?

Julian seems straightforward: use the same ce/bce eras as gregory.

For a Julian/Gregory calendar, the answer kinda depends on the design of the calendar, specifically how (whether?) the calendar lets users discover whether a particular Temporal.PlainDate instance using that calendar is a Julian or Gregorian date. The most discoverable way would probably be to use the era to distinguish between Julian or Gregorian. A custom property could be used too. Or an additional prototype method on this calendar only. There's probably others.

If we don't offer this capability at all, or if we want to use a separate method or getter like afterGregorianTransition to distinguish Julian vs. Gregorian, then the era could be ce to align with gregory and julian.

If we want to use era to distinguish Julian vs. Gregorian, then the era getter could return bce, julian or gregory. IMO this would be a more discoverable option.