tc39 / proposal-intl-displaynames-v2

Intl DisplayNames API V2
https://tc39.es/proposal-intl-displaynames-v2/
MIT License
23 stars 3 forks source link

Indexing month names for leap month #11

Closed FrankYFTang closed 3 years ago

FrankYFTang commented 4 years ago

This is a move of https://github.com/tc39/proposal-intl-displaynames/issues/55 from v1 repo

@sffc commented on Oct 16, 2019 I was talking to @pedberg-icu yesterday about how to index month names for the purpose of the Intl.DisplayNames API. He suggested that it could make sense to add an optional second argument to the .of() method for type: "month" to indicate whether the month is a leap-month. Opening an issue to continue this discussion.

@sffc commented on Dec 1, 2019 A snag in the Hebrew calendar: the digits used to represent months map to different strings each year. For example:

new Date(2018, 2, 20).toLocaleDateString("en-US-u-ca-hebrew", { month: "long", day: "numeric" })
"Nisan 4"
new Date(2019, 2, 20).toLocaleDateString("en-US-u-ca-hebrew", { month: "long", day: "numeric" })
"Adar II 13"
new Date(2018, 2, 20).toLocaleDateString("en-US-u-ca-hebrew", { month: "numeric", day: "numeric" })
"7/4"
new Date(2019, 2, 20).toLocaleDateString("en-US-u-ca-hebrew", { month: "numeric", day: "numeric" })
"7/13"

tc39/proposal-temporal#290

@sffc commented on Dec 8, 2019 This is still a really important open question. We are exploring options in the Temporal repository at tc39/proposal-temporal#290 because it would be nice for the Temporal month and Intl.DisplayNames month to match.

Can we recommend that when browsers ship this feature, they hold back on shipping the month and maybe also weekday types for the time being until we have this issue resolved?

@ljharb commented on Dec 8, 2019 If that's going to be a request, that seems like a reason not to request stage 3 until it's resolved.

@sffc commented on Dec 8, 2019 • This proposal is already Stage 3 unfortunately. The issue about leap months came up after we had asked for Stage 3. In any case, I do not want to block the rest of the proposal on this edge case, since the rest of it is really solid and unlikely to be affected by our decision on month names.

@ljharb commented on Dec 8, 2019 Oh, my mistake, I thought this was Temporal :-) in that case that does make sense to either ask implementers to partially ship, or to drop it back to stage 2.

@sffc commented on Dec 12, 2019 Per today's ECMA-402 meeting, we are removing all date/time types from Intl.DisplayNames and saving them for V2, once Temporal has set the precedent. I expect these to be added in a follow-on proposal or PR later in 2020.

@sffc commented on Feb 13 • I'm going to keep the conversation going here, since we do need to resolve this issue for both Temporal and DisplayNames, and I would like it to be resolved in 2020.

Background: Many lunisolar calendars have the concept of "leap months", which are extra months added to a year in order to maintain rough alignment between moon cycles and solar years.

In the Hebrew and Chinese lunisolar calendars, an existing month is "duplicated". For example, in Hebrew, the last month, "Adar", is duplicated into "Adar I" and "Adar II". In the Chinese calendar, the duplicated month could be any of the months, not just the last month; according to Wikipedia:

The first month without a mid-climate is the leap, or intercalary, month. In other words, the first month that doesn't include a major solar term is the leap month.[17] Leap months are numbered with rùn 閏, the character for "intercalary", plus the name of the month they follow. In 2017, the intercalary month after month six was called Rùn Liùyuè, or "intercalary sixth month" (閏六月) and written as 6i or 6+. The next intercalary month (in 2020, after month four) will be called Rùn Sìyuè (閏四月) and written 4i or 4+.

Here are some options on how to identify the month:

Month is an integer, leap months use the same numeric value as their non-leap equivalent, and we add an additional optional argument which takes "isLeapMonth". For Chinese, that argument would be true or false. For Hebrew, we would need three cases, since "Adar", "Adar I", and "Adar II" are all separate names. Problem: The user needs to know to use the optional argument. If they pass their month value from Temporal directly into Intl.DisplayNames, they might miss crucial information. Month is an integer, and we create special numeric values to represent the leap months. For example, in Chinese, a negative number could be the leap version of the corresponding positive number. In Hebrew, we could make 12 be "Adar", 13 be "Adar I", and 14 be "Adar II". Problem: This might not be backwards-compatible due to the observation in #55 (comment). Month is not an integer but rather is a complex type depending on the calendar system. For example, the Hebrew calendar could have something like [12] for "Adar", [12, false] for "Adar I", and [12, true] for "Adar II". The Chinese calendar could have [4] for "Sìyuè" and [4, true] for "Rùn Sìyuè". Problem: We lose a certain simplicity such as the ability to compare two month values using the < or > operator. Month is a first-class type, Temporal.Month, with semantics defined by the calendar system. Problem: This would need to be proposed to Temporal. So far all Temporal types have been clean integers, and this would add complexity to the data model. Thoughts?

@sffc commented on Feb 13 Additional ideas from discussion with @FrankYFTang and @echeran:

Intl.DisplayNames.prototype.of for type="month" take a year and returns an array of all the months in that year, sorted in order. The list could be length 12 or length 13. Do not expose month names in Intl.DisplayNames, instead relying on Intl.DateTimeFormat with month: "long". Also consider making the Temporal Chinese date return a half number like 4.5 for leap month 4.

@sffc commented on May 28 As posted in tc39/proposal-temporal#573:

I am increasingly liking the idea of passing a proper YearMonth into Intl.DisplayNames. A YearMonth unambiguously says what calendar system and which month in which year you want to format. So for example, to get Ayar II, you could do:

const names = new Intl.DisplayNames("en-US", { type: "month" });
const ayarII = names.of(Temporal.YearMonth.from({
  calendar: "hebrew",
  year: 5779,
  month: 7,
});
console.log(ayarII);  // "Ayar II"
FrankYFTang commented 4 years ago

I put together a slide to show my research result of this topic in Month in Calendars supported by ICU

gibson042 commented 4 years ago

Other relevant prior art: RFC 7529 (iCalendar non-Gregorian recurrence rules)

  1. Numeric values 1 through N are used to identify regular, non-leap, months (where N is the number of months in a regular, non- leap, year).
  2. The suffix "L" is added to the regular month number to indicate a leap month that follows the regular month, e.g., "5L" is a leap month that follows the 5th regular month in the year.

RRULE:RSCALE=HEBREW;FREQ=YEARLY;BYMONTH=5L;BYMONTHDAY=8;SKIP=FORWARD… These define a recurring event for the 8th day of the Hebrew month of Adar I (the leap month identified by "5L")

sffc commented 3 years ago

@FrankYFTang I like the idea of using iCal-style identifiers here. How do you distinguish between "Adar" and "Adar II", since they are the same iCal month?

FrankYFTang commented 3 years ago

@FrankYFTang I like the idea of using iCal-style identifiers here. How do you distinguish between "Adar" and "Adar II", since they are the same iCal month?

https://www.chabad.org/library/article_cdo/aid/2263483/jewish/Adar-Adar-II.htm hum... that is an interesting question in term of how to get the name of a non-leap month in a leap year while the name are different in that leap year....

sffc commented 3 years ago

In Hebrew,would it be too awkward to make 5L = Adar I, 6 = Adar, and 6L = Adar II? In a normal year, you'd have months 5, 6, 7, and in a leap year, you'd have 5, 5L, 6L, 7.

CC @Manishearth @justingrant

Manishearth commented 3 years ago

My impression is that Adar II is the "real" (original) Adar, and is where all the festivals are celebrated. I cannot verify but my gut feeling is that Adar II being 6, Adar being 6, and Adar I being 6L would make the most sense, with the L suffix meaning "leap month sharing the name of the regular month" (not necessarily "leap month proceeding the numbered month")

We could also have 6P as opposed to 6L. Or something.

justingrant commented 3 years ago

I admit that I don't fully understand the goals of using DisplayNames for date/time data. What's the problem we're trying to solve? How do use cases of DisplayNames.prototype.of vary from use cases for DateTimeFormat.formatToParts with only one part? Is there something I can do with the former that I can't do with the latter? Is the former just a more ergonomic variant of the latter?

With the caveat that I don't have a lot of context about the DisplayNames proposal, my initial suggestion would be to require both a year and a month in the input in the same format that would be accepted by Temporal.PlainYearMonth.from, e.g.

My impression is that Adar II is the "real" (original) Adar, and is where all the festivals are celebrated. I cannot verify but my gut feeling is that Adar II being 6, Adar being 6, and Adar I being 6L would make the most sense, with the L suffix meaning "leap month sharing the name of the regular month" (not necessarily "leap month proceeding the numbered month")

We could also have 6P as opposed to 6L. Or something.

Given that there's prior art in the iCalendar RFC 7529 standard (where Adar I is '5L'), I'd strongly suggest that we follow that standard for easier interop. I agree that if we were starting from scratch then it'd be nice if the month code gave a hint about which was the regular month and how the unusual month it relates to the regular month (after like Chinese, before like Hebrew, merge like non-ICU Hindu, etc.) but given that there's already a standard that works for all current ICU calendars, I'd be inclined to follow that standard. Especially since the workaround (always provide a year or a Temporal object in the input) seems straightforward.

@gibson042 - feel free to chime in here, I know in the past you've been pretty adamant in the past about trying to draft behind existing standards wherever possible.

sffc commented 3 years ago

How do use cases of DisplayNames.prototype.of vary from use cases for DateTimeFormat.formatToParts with only one part?

Frank discusses this at length in the README, but I'm also skeptical. Let's keep the meta discussion in #4.

Given that there's prior art in the iCalendar RFC 7529 standard (where Adar I is '5L'), I'd strongly suggest that we follow that standard for easier interop.

I definitely prefer to be fully RFC 7529 all else equal. The reason I proposed the possibility of using 6L for Adar II is:

  1. It's not actually clear in RFC 7529 what month code is used for Adar II. There is only an example for Adar I.
  2. Even if Adar and Adar II were both conventionally 6, it might be mostly compatible for us to use 6L, since 6L can simplify to 6 during calculations.
FrankYFTang commented 3 years ago

I admit that I don't fully understand the goals of using DisplayNames for date/time data. What's the problem we're trying to solve? How do use cases of DisplayNames.prototype.of vary from use cases for DateTimeFormat.formatToParts with only one part? Is there something I can do with the former that I can't do with the latter? Is the former just a more ergonomic variant of the latter?

Please read https://github.com/tc39/intl-displaynames-v2#month-names of all the problems of if we do not use DisplayNames but DateTimeFormat would lead to.

FrankYFTang commented 3 years ago

Given that there's prior art in the iCalendar RFC 7529 standard (where Adar I is '5L')

The problem is not about the use "5L" as index to access the name "Adar I" The problem is what should we use as index to access the name of the month before the month "Nisan" ("7")

The month name for the month before "Nisan" is "Adar" in a year without leap year The month name for the month before "Nisan" is "Adar II" in a year with leap year

"Adar" and "Adar II" is two different STRINGS. (even they MEAN the same month in two different years)

It is clear the index to access "Adar I' is "5L" and it is clear the index to access "Adar" is "5", but what is the index to access "Adar II"? It cannot be "5" because "5" is to access "Adar" and we need a different index to access "Adar II".

In a year without leap, the names of the month are

                        "Tishri",        // 1
                        "Heshvan",  // 2
                        "Kislev",       // 3
                        "Tevet",       // 4
                        "Shevat",    // 5 
                        "Adar",        // 6
                        "Nisan",     // 7
                        "Iyar",       // 8
                        "Sivan",    // 9
                        "Tamuz",   // 10
                        "Av",           // 11
                        "Elul",        // 12

In a year with leap month

                        "Tishri",        // 1
                        "Heshvan",  // 2
                        "Kislev",       // 3
                        "Tevet",       // 4
                        "Shevat",    // 5 
                        "Adar I",        // "5L"
                       "Adar II",        // ??? - "6" will give us "Adar" not "Adar II". 
                        "Nisan",     // 7
                        "Iyar",       // 8
                        "Sivan",    // 9
                        "Tamuz",   // 10
                        "Av",           // 11
                        "Elul",        // 12
justingrant commented 3 years ago

I definitely prefer to be fully RFC 7529 all else equal. The reason I proposed the possibility of using 6L for Adar II is:

  1. It's not actually clear in RFC 7529 what month code is used for Adar II. There is only an example for Adar I.

I asked on the calsify mailing list about what code Adar II is given. Here was the answer: Adar (or Adar II in a leap year) is always month "6". Adar I is "5L" since its inserted between "5" and "6".

  1. Even if Adar and Adar II were both conventionally 6, it might be mostly compatible for us to use 6L, since 6L can simplify to 6 during calculations.

What would be the advantage of doing it this way? Is it only that the developer can cache a map of codes to names without having to call an Intl API with the month, year, and calendar as input to get the localized string? Or is there some other benefit besides saving an API call?

The problem is what should we use as index to access the name of the month before the month "Nisan" ("7")

I recently implemented an initial Hebrew calendar for Temporal, so I understand this problem well. My suggestion in https://github.com/tc39/intl-displaynames-v2/issues/11#issuecomment-758230162 is to require both a year, month, and calendar in the input. What's wrong with that solution, esp. if any Temporal object with a month and calendar could also be used? (EDIT: added "calendar" in this paragraph-- forgot it the first time)

Please read https://github.com/tc39/intl-displaynames-v2#month-names of all the problems of if we do not use DisplayNames but DateTimeFormat would lead to.

Yep, I read that before posting. The problems mentioned there seem like problems with legacy Date's inability to know about time zones and calendars. If a Temporal object (or a property bag or string that could be turned into a Temporal object) is used as input, which of those problems still apply?

sffc commented 3 years ago

I recently implemented an initial Hebrew calendar for Temporal, so I understand this problem well. My suggestion in #11 (comment) is to require both a year, month, and calendar in the input. What's wrong with that solution, esp. if any Temporal object with a month and calendar could also be used?

I want to be able to blindly look up a month name string from a month name identifier. If I have to figure that out from a year/month combo, I need to pull in complicated math logic into the date formatter.

Yep, I read that before posting. The problems mentioned there seem like problems with legacy Date's inability to know about time zones and calendars. If a Temporal object (or a property bag or string that could be turned into a Temporal object) is used as input, which of those problems still apply?

Meta discussion in #4.

justingrant commented 3 years ago

I want to be able to blindly look up a month name string from a month name identifier. If I have to figure that out from a year/month combo, I need to pull in complicated math logic into the date formatter.

Who is "I" here? Developers using Intl APIs? Or 402 implementers in V8 and/or browsers?

Also, what's the complicated math logic? The Hebrew leap year test is 2 lines of code in ICU: https://github.com/unicode-org/icu/blob/a84fdd0e903fb20acd93ed186a0da4c0c071a0e6/icu4c/source/i18n/hebrwcal.cpp#L469-L479.

Meta discussion in #4.

Sorry, missed that one. I'll comment over there.

FrankYFTang commented 3 years ago

"require both a year, month, and calendar in the input. " This mean we won't have a way to return the string of the month name without the context of "year" and we always need to apply calendar calculation to decide a simple month code to month name mapping, right? I would rather keep such complicated calendar calculation outside a simple low level string access API, maybe we should add an additional option forLeapYear instead? and return "Adar" for 6 if forLeapYear is false and "Adar II" for 6 if forLeapYear is true?

Manishearth commented 3 years ago

I think one thing to highlight is that for the range of Temporal APIs, we are both consuming and producing month codes, and it's perfectly reasonable to have a different way of handling each side of this. I feel like applying Postel's law is useful here: we should be very precise in the month codes we produce (we should never produce the code used for Adar when talking about Adar II), but we can be liberal in the month codes we consume (Adar II should be referred to with "6L" or something, but can be referred to with "6" by the user)

sffc commented 3 years ago

I want to be able to blindly look up a month name string from a month name identifier. If I have to figure that out from a year/month combo, I need to pull in complicated math logic into the date formatter.

Who is "I" here? Developers using Intl APIs? Or 402 implementers in V8 and/or browsers?

402 implementers.

Also, what's the complicated math logic? The Hebrew leap year test is 2 lines of code in ICU: https://github.com/unicode-org/icu/blob/a84fdd0e903fb20acd93ed186a0da4c0c071a0e6/icu4c/source/i18n/hebrwcal.cpp#L469-L479.

Acknowledged.

I still think it's weird (non-elegant) that we are proposing something that works without issue for all months in all CLDR calendars, but this one little exception requires significantly increasing the complexity (passing a pair of month code and year, instead of month code by itself).

justingrant commented 3 years ago

Could I suggest that we put this discussion on hold while we first try to resolve #4? I'm still confused about why a new API is needed if using Temporal objects instead of Date inputs for Intl.DateTimeFormat.formatToParts solves the problems noted in https://github.com/tc39/intl-displaynames-v2#month-names.

If a new API isn't needed, then we can avoid the issues noted above. If there are use cases where using Temporal inputs won't solve the problem, then that also might help resolve this issue by narrowing this discussion to only those no-workaround cases.

we should never produce the code used for Adar when talking about Adar II)

This would break use cases like date1.monthCode === date2.monthCode across different years. It'd also means that any app that wants to interop with iCalendar would need translation logic between iCalendar month codes and Temporal/Intl month codes. Or, perhaps more likely, developers who do month comparisons or iCalendar interop will probably not realize that the problems above exist, and their code will intermittently break when passed a Hebrew date.

I would be OK with accepting 6L as a special case that applies to formatting only. But not producing it, because that'd break non-formatting business logic as noted above. But I still think the better option is to avoid this whole issue by accepting Temporal objects because they will always know the year and the calendar so will never be ambiguous. Let's continue discussing over in #4!

Manishearth commented 3 years ago

This would break use cases like date1.monthCode === date2.monthCode across different years

The desire for equality and (brought up in another thread) sorting requirements makes me feel like Temporal.PlainMonth is better suited for handling this, but ideally we don't have to go down that route.

sffc commented 3 years ago

I'm increasingly thinking that our use case here, uniquely identifying month names for the purpose of i18n formatting, is simply not the use case for which the RFC 7529 codes are designed. Given that we have a different problem, we should look at a different solution.

As 402 implementers, this is a problem that we need to solve regardless of whether it makes it into Intl.DisplayNames (see https://github.com/unicode-org/icu4x/issues/355). However, I agree that depending on how #4 gets resolved, it might be a moot point from the Temporal and 402 point of view.

sffc commented 3 years ago

2021-01-14: agreed to remove month and weekday from this proposal. So this discussion is now moot. However, it will still be relevant for the ICU4X implementation under the hood.