Closed justingrant closed 1 year ago
I like option 4, but would rename it to something like "Canonicalization doesn't follow Links" and would also make clear that Link chains terminating in "Etc/UTC" or "Etc/GMT" are still followed and canonicalized to "UTC" (unless that option splits into e.g. 4a and 4b with different conclusions regarding this behavior).
- i) How should
Intl.DateTimeFormat.p.resolvedOptions().timeZone
behave? Should it also stop canonicalizing? If yes, should it add a newcanonicalTimeZone
property?
That definitely gets into backwards compatibility territory, and it is plausible—if not likely—that some existing code already uses dtf.resolvedOptions().timeZone
to get the host-reported canonical spelling of a time zone name. So I don't think changing that is on the table.
- ii) Will there be any change to user-visible output of
Intl.DateTimeFormat.p.format
orDate.p.toLocaleString
? I suspect that the answer is "no" because localized descriptions of time zones don't usually surface the IANA identifiers, but not 100% sure about this.
https://github.com/tc39/ecma402/issues/119 does propose exposing the IANA names, but I don't recall any existing text requiring implementations to do so (although they certainly could anyway; ECMA-402 gives formatters lots of flexibility).
- iii) What changes (if any) would be required to CLDR and/or ICU to support this change?
I defer to @sffc.
- iv) Even if we avoid the canonicalization mess, there's still the pre-1970-data question. The unmerged fork will have it, the merged fork will only have it for the merged zones. This would mean, for example, that Europe/Copenhagen pre-1970 results could vary by fork. So which fork should we recommend that implementers use? I don't have a strong opinion here. It'd be nice to understand the size of pre-1970 data to know how much smaller browser downloads would get if this data were removed.
I don't have a strong opinion here either.
- v) Should case differences still be canonicalized, e.g.
Europe/Paris
vs.europe/paris
? My opinion: yes, we should canonicalize.
Yes. Canonicalization still happens, it just doesn't follow Links (except for special-casing GMT and UTC).
- vi) Should spelling differences due to renaming also be canonicalized, e.g.
Asia/Calcutta
vs.Asia/Kolkata
. My opinion: no, because by not canonicalizingid
in this case we can avoid user complaints like this chromium bug, and we can ensure future compatibility & round-trippability even if zones are renamed in the future. Note thatequals
should probably report these astrue
though. (See below.)
No. Doing this would make the behavior less comprehensible and would sacrifice potential benefits.
- vii) Should we add a
TimeZone.p.equals
method? I think we should, both for consistency across Temporal types and to help code be robust in the face of past or future renames of cities which seems to happen fairly often globally. JS code should be able to ask "Is this date in the India time zone" without having to worry that that code will be broken by a past or future rename.
There should definitely be some way to identify that there is a Link chain establishing equality between two time zones with different names, and ideally a way to determine its directionality (e.g., detecting that Atlantic/Reykjavik is a Link to Africa/Abidjan rather than the reverse).
- viii) If we add
equals
should we also add a method that tests if all rules are the same across time zones, e.g.Atlantic/Reykyavik
vs.Africa/Abidjan
? I don't think this is needed. Userland code can always usegetNextTransition
in a loop to check for this kind of equality, and if there's user demand we could always add it in a later release.
Agreed; not needed at this time.
- ix) How should UTC zone be handled? I think this is straightforward: all zones whose canonical identifier is
Etc/UTC
should resolve toUTC
in ECMAScript, matching current behavior. There's no value in changing this existing behavior.
Note that current behavior also maps "Etc/GMT" to "UTC": https://tc39.es/proposal-temporal/#sup-canonicalizetimezonename
- x) In order for the
PACKRATLIST
option to work, TZDB data must provide a way to differentiate "merged" links likeAtlantic/Reykyavik
=>Africa/Abidjan
from "renamed" links likeAsia/Calcutta
vs.Asia/Kolkata
. How does this differentiation work, and does is work for all links or are there gaps? It sounds like @anba may know how this works.
AFAICT, there's no explicit differentiation... rather, just a zone.tab file identifying for each ISO 3166-1 alpha-2 country code the corresponding time zones, which can theirselves identify Links (as is currently the case for DE/IS/etc.).
If we add
equals
, here's a suggestion for its behavior:
- It should accept objects or strings.
- If the receiver and/or the argument is a custom zone, use its
id
property.
Disagree on this; custom time zones should be compared by referential object identity. A custom time zone that happens to have id
"UTC" is not equal to the built-in UTC time zone, and two custom time zones with the same id
can have very different behavior. This probably means that distinct objects representing built-in time zones should also be reported non-equal, and it might therefore make sense to act on strings only (since authors already have ===
and Object.is
for comparing objects). But I'd prefer to keep object input, such that Temporal.TimeZone.equals(zdt.timeZone, zdt.timeZone)
is always acceptable (with an internal implementation that uses Link-aware canonicalization when both inputs are strings and otherwise validates both inputs as time zones using standard behavior such as ToTemporalTimeZone [or a more appropriate alternative without e.g. timeZone
and ToString fallbacks] and compares the result using SameValue).
- Treat different casings as equal, e.g.
Europe/Paris
vs.europe/paris
.- If both receiver and argument canonicalize to
Etc/UTC
then treat them as equal.
:+1:
- Treat different spellings of the same location as equal, e.g.
Asia/Calcutta
vs.Asia/Kolkata
, because they represent the same thing with different spelling.- DO NOT treat different locations (like
Atlantic/Reykyavik
vs.Africa/Abidjan
) as equal, even if all their time zone transitions are the same, because future changes could make those locations have different time zone rules. Per above, if users want to evaluate "all rules are the same" then can do this in userland by comparing time zone transitions in a loop. Although honestly I'm skeptical that this will be a popular use case. Who cares if the rules are equal?
I also disagree on these last points, although really it seems to be mostly a question of modeling—I don't think Temporal should classify Links as "alternate spelling" vs. "real", but rather just treat any Link relationship as establishing equality. In an implementation that uses the standard IANA time zone data, Atlantic/Reykyavik and Africa/Abidjan are equal until and unless a policy change causes them to diverge.
- Treat different spellings of the same location as equal, e.g.
Asia/Calcutta
vs.Asia/Kolkata
, because they represent the same thing with different spelling.- DO NOT treat different locations (like
Atlantic/Reykyavik
vs.Africa/Abidjan
) as equal, even if all their time zone transitions are the same, because future changes could make those locations have different time zone rules. Per above, if users want to evaluate "all rules are the same" then can do this in userland by comparing time zone transitions in a loop. Although honestly I'm skeptical that this will be a popular use case. Who cares if the rules are equal?I also disagree on these last points, although really it seems to be mostly a question of modeling—I don't think Temporal should classify Links as "alternate spelling" vs. "real", but rather just treat any Link relationship as establishing equality. In an implementation that uses the standard IANA time zone data, Atlantic/Reykyavik and Africa/Abidjan are equal until and unless a policy change causes them to diverge.
I agree with @gibson042, partly for a far more practical reason: maintenance. The IANA source doesn't have any API differences between "links due to similar clocks" and "links due to renames". The backward
file was tidied up in the wake of this forking discussion — it now has commented groups of links based on their reasons. But this is only a convention in a single file, and not guaranteed to be a stable API.
If Temporal was to distinguish between the two cases in an API, there would need to be a stable maintenance process for adding brand new links to the correct category.
I also disagree on these last points, although really it seems to be mostly a question of modeling—I don't think Temporal should classify Links as "alternate spelling" vs. "real", but rather just treat any Link relationship as establishing equality. In an implementation that uses the standard IANA time zone data, Atlantic/Reykyavik and Africa/Abidjan are equal until and unless a policy change causes them to diverge.
My assumption is that, ideally, there'd be two categories of links:
The first type of link (let's call them "synonyms") conveys no semantic value. Programs will never behave differently depending on which ones you use (other than when comparing the id
strings themselves).
The second type of link (let's call them "merges") conveys semantically different information that could change the behavior of future programs beyond string comparison.
The particular use case I had in mind where it's helpful to know that difference is helping is when a program has logic like this: "I want to do special processing for timestamps for X" (where "X" is a particular country like India or Sweden). Like this:
if (Temporal.TimeZone.from('Europe/Copenhagen').equals(zdt.timeZoneId)) {
// do India-specific stuff
} else {
// non-India-specfic logic
}
It would be bad if future changes in the spelling of the desired English transliteration of "Copenhagen" caused the code above to break. So it's probably good practice for any code that checks for a specific time zone (or that wants to compare two ZDT timestamps to know if they're semantically identical) to use equals
instead of comparing id
.
But it'd also be bad if the price of protecting against future spelling changes meant that you'd need to false-negatively run jurisdiction-specific logic for other jurisdictions that coincidentally share the same time zone rules.
It's true that, continuing that example above, if Denmark split into multiple time zones then the code above would break. But I think this is OK, because the change happened in Denmark so of course Denmark-specific code will need to change. My main concern is that if you treat all aliases the same, then equals
becomes riskier because you can never predict what other semantically-different zones are being lumped into the same bucket.
So I do think there's a case that being able to distinguish these cases is important. But...
I agree with @gibson042, partly for a far more practical reason: maintenance. The IANA source doesn't have any API differences between "links due to similar clocks" and "links due to renames". The
backward
file was tidied up in the wake of this forking discussion — it now has commented groups of links based on their reasons. But this is only a convention in a single file, and not guaranteed to be a stable API.
One possible (needs validation) solution using existing data would be to use zone.tab which includes pre-merge data. If a link from backward
is also present in zone.tab
them it's a merge, otherwise it's a synonym. I haven't done the work to validate that this will work perfectly, though!
If Temporal was to distinguish between the two cases in an API, there would need to be a stable maintenance process for adding brand new links to the correct category.
Agree, if the approach above won't work. We'd want to work with the IANA folks (or maybe ICU/CLDR?) to ensure that distinction is maintained in the future via some other solution.
There's less than 300 total links so this isn't a lot of ongoing maintenance work (would probably add <1hr/year of work to someone's plate) but someone would have to be willing to commit to doing the work long-term.
BTW, I'd volunteer do make an initial PR into TZDB, if it's decided that this split would be good to maintain AND if the data files need to change somehow.
One possible (needs validation) solution using existing data would be to use zone.tab which includes pre-merge data. If a link from
backward
is also present inzone.tab
them it's a merge, otherwise it's a synonym. I haven't done the work to validate that this will work perfectly, though!
You'd also need to consider backzone, because e.g. Africa/Timbuktu does not appear in zone.tab but is a "merge" (to use your term) of Africa/Abidjan in the primary data but (presumably) a synonym of Africa/Bamako in the pre-1970 data, and I think the same applies to everything in the "Non-zone.tab locations with timestamps since 1970 that duplicate those of an existing location" section mentioned below.
if the approach above won't work. We'd want to work with the IANA folks (or maybe ICU/CLDR?) to ensure that distinction is maintained in the future via some other solution.
That seems like a goal that exceeds the scope of Temporal v1.
BTW, I'd volunteer do make an initial PR into TZDB, if it's decided that this split would be good to maintain AND if the data files need to change somehow.
AFAICT tzdata Links are all created equal—the only existing data that could be used is unstructured section-heading comment text like "Pre-2013 practice, which typically had a Zone per zone.tab line" and "Non-zone.tab locations with timestamps since 1970 that duplicate those of an existing location". So I guess you'd be proposing something like a new merged file that exclusively contains the content from those section(s) and a Temporal equality comparison that ignores its contents?
BTW, I'd volunteer do make an initial PR into TZDB, if it's decided that this split would be good to maintain AND if the data files need to change somehow.
It's probably best to read this whole discussion thread first: https://mm.icann.org/pipermail/tz/2021-November/031074.html
That thread is what eventually produced the current grouped-under-comment-headings format of backward
, despite calls for the changes to be easier to determine programmatically.
I would definitely like a change to the current format (I commented in that linked thread). But part of the reason the tzdb structure doesn't change often is the sheer number and variety of downstream consumers that have to be able to handle any new format.
Yep, you're right: backzone
was needed too. I'm building a quick proof-of-concept to understand how Intl is currently canonicalizing Links in the IANA Time Zone Database. Will share shortly. So far I see two results:
Will share more results when I finish the investigation.
Initial investigation is complete. Results are here: https://4rylir.csb.app (full-screen view) and https://codesandbox.io/s/iana-vs-es-4rylir (source code). You can filter or sort to understand the various kinds of links.
Intl
and the Temporal polyfill. The rest of the Links are:
I took a first pass at classifying links as synonyms or merges based on the following algorithm:
backward
but is also a Zone in backzone
, then by definition it is a merge because it had separate pre-1970 rules.Antarctica/South_Pole
=> Antarctica/McMurdo
=> Pacific/Auckland
in the latest TZDB. Therefore, Antarctica/McMurdo
=> Pacific/Auckland
is a merge.I manually verified all 86 synonyms identified by the algorithm above. There were these patterns:
Calcutta
=> Kolkata
with the old name kept around for backwards compatibility.America/Indianapolis
vs. America/Indiana/Indianapolis
Iceland
=> Atlantic/Reykyavik
, or PRC
=> Asia/Shanghai
. While a pedantic person could argue that these are not synonyms, in practice these IDs are deprecated and shouldn't be used anyways, so IMO a synonym seems reasonable here.America/Atka
=> America/Adak
- These are two sparsely populated Aleutian islands. Definitely different places, but merged in TZDB. I found no docs about why they were merged.America/Fort_Wayne
=> America/Indiana/Indianapolis
- different cities in the same state that have the same rules. I could find no documentation about why America/Fort_Wayne
has its own Link.America/Santa_Isabel
=> America/Tijuana
- according to the TZDB's history docs, adding America/Santa_Isabel
was a mistake based on bad source info, and this Link reverted this mistake.Pacific/Yap
=> Pacific/Truk
(or its synonym Pacific/Chuuk
) - These are two small islands in Micronesia which are close to each other and which, even in backzone
seem to have had the same time zone rules. I found no docs explaining why they were merged.I also manually checked through the Links identified as merges , and I was unable to find any that looked like they should be synonyms.
My initial reaction is that it's not the job of Temporal to tell implementations what they can/should and can't/shouldn't do in this area. I can at least say that any solution that involves "don't canonicalize time zone names" likely means that ICU's time zone utilities can no longer be used for data storage; they can be used for calculations, but Temporal glue code will need to be implemented to conform to the spec rather than just following with ICU behavior as we've been doing for a long time.
My initial reaction is that it's not the job of Temporal to tell implementations what they can/should and can't/shouldn't do in this area.
Before I did this research I probably would have agreed with you, but now that I've dug into the problem I'm quite concerned about the impact of canonicalization on the stability of ECMAScript code across engines and across time. From what I've seen, canonicalization changes very frequently, and implementations seem to vary quite a bit in how they apply canonicalization.
This has really made me question the value of exposing canonicalized IDs to userland developers. We're already seen (in this repo, in Chrome's bugs, etc.) user complaints about canonicalization when differences are usually limited to only minor variations like Calcutta vs. Kolkata. And that's with almost 2/3 of Links in the current IANA TZDB not being followed by engines to IANA's canonical IDs.
If engines start resolving Canadian time zones to Panama, Iceland to Cote d'Ivoire, and Stockholm to Berlin, we can expect many more complaints, user confusion, broken tests, etc.
Who'd be a good person to talk with to understand how ICU currently approaches this problem? How do they determine which Links to follow and which to ignore?
likely means that ICU's time zone utilities can no longer be used for data storage; they can be used for calculations, but Temporal glue code will need to be implemented to conform to the spec
I assume that implementations would need to store both the caller's (case-normalized) original string input as well as a pointer to the data structure that ICU uses to represent a canonicalized time zone. Is that what you mean by "storage"?
The stored string would be used by #2482's ToTemporalTimeZoneIdentifier
, which in turn powers TimeZone's id
and toString
, ZDT's toString
, etc. The ICU pointer would be used for all calculations. Does that match what you had in mind?
If we also wanted to offer a TimeZone.p.equals
and if it only returned true for synonyms, then presumably there'd need to be support added (to ICU? by implementations?) to compare two time zones for "synonym equality" per discussion above. This wouldn't be needed if we don't offer this method, or if it compares only the id
or ICU's fully-canonicalized identifier.
Other than above, what other glue code would be needed?
@yumaoka and @pedberg-icu know the most about ICU4C time zone handling.
For ICU4X, we currently persist time zones by BCP-47 ID. We can (or will be able to) take IANA strings and map them to BCP-47, and then we lookup the canonical ID to go in the other direction. There is an issue (https://github.com/unicode-org/icu4x/issues/2909) discussing which source of truth we should use for canonicalization.
I'm currently neutral on the actual usability issue. I'm just pointing out that we're in effect moving more responsibility out of ICU[4X] and into the Temporal glue code. This logic about how to compare time zones for equality, what form of canonicalization to apply to them, etc., is not easy, as your OP shows. ICU/CLDR already solves these problems in its own way, as it has been doing for a long time. Moving these problems into Temporal glue code just makes Temporal harder to implement and harder to test. If the champions think that the problem is big enough to warrant the additional (nontrivial) implementation cost, so be it.
If the champions think that the problem is big enough to warrant the additional (nontrivial) implementation cost, so be it.
I don't, for one! I think the TZDB fork is a problem which JS implementations can coordinate among themselves to solve. Pulling the responsibility for solving the problem into our domain will delay the proposal, while delivering an incomplete solution (because this is a problem that applies outside of Temporal as well, and those parts we can't solve.)
Question. Can this behavior be changed as a Temporal V2 follow-up?
Logistically, I think it's fair to say that moving forward with this change is going to delay Temporal implementations by another several months, given that we need to discuss this in various venues to achieve consensus, then write the spec text, then the tests, then the ICU functions discussed above, then in-flight implementations need to be updated.
An appendix to the synonym vs. merge investigation above: CLDR helpfully provides synonym data here. Example:
"inccu": {
"_description": "Kolkata, India",
"_alias": "Asia/Calcutta Asia/Kolkata"
},
If CLDR is the source of truth for time zone identifiers, then it's easy to distinguish merges from aliases.
TZDB fork is a problem which JS implementations can coordinate among themselves to solve.
My concern is that implementations have had years to do this coordination... and haven't done it. With Temporal V1 we have a one-time opportunity to reduce churn in the ecosystem forever... and from what I've seen coming down the road from IANA, avoiding the whole "what's the right canonical ID?" question forever (at least for Temporal) seems appealing.
For ICU4X, we currently persist time zones by BCP-47 ID.
Is the current plan for V8 to implement Temporal using ICU4C or ICU4X?
Question. Can this behavior be changed as a Temporal V2 follow-up?
TimeZone.p.equals
could be deferred to V2. But in V1...
TimeZone.p.id
and ZDT.p.timeZoneId
are canonicalized, and if so how. I doubt that changing this in V2 would be web-compatible. (One could argue that canonicalization is already unpredictable so making changes in V2 could be OK, but most of the web has kept Asia/Calcutta
stable for over a decade. So I'm not sure that argument could get consensus.)ZDT.p.equals
needs to have an opinion about what time zone equality means. I doubt that it'd be web-compatible for the code below to return true
in V1 and false
in V2.
zdt = Temporal.ZonedDateTime.from('2020-01-01T00:00[Europe/Copenhagen]');
zdt.equals('2020-01-01T00:00[Europe/Berlin]');
One approach that I think might be web-compatible would be to not canonicalize TimeZone.p.id
and ZDT.p.timeZoneId
at all in V1 (except setting them to 'UTC'
for backwards compat). Given that we'd document that all id
comparison should be case-insensitive, then it might maybe be web-compatible to do case-normalization on the identifier so that case-sensitive comparisons would work too. Not sure about this though.
Logistically, I think it's fair to say that moving forward with this change is going to delay Temporal implementations by another several months
Yep, agree. Although if we went with the "don't canonicalize IDs except UTC" solution above, that would require zero changes from ICU, and would only require a small change from implementers which could be bundled with the changes in #2482 which will already change how TimeZone slots are stored and used. The delta of additional implementer effort seems quite small.
But I agree that once we start asking for any different canonicalization behavior, I agree this would introduce delay. Which might be an argument for the "no-canonicalize" solution or the "full canonicalize" status quo as the best options for V1.
If we let ICU keep canonicalizing the .id
and .timeZoneId
values, which are known to be variable over time, then a change where we standardize on one particular canonicalization solution over another is likely to be web-compatible.
In other words, if we went with option 3 now, we could adopt options 1 or 2 (or even 4) later.
Option 4 has implementation concerns just like options 1 and 2. The laundry list of 10 questions in the OP is well thought out, but they are questions we need to resolve if we were to implement option 4, and, again, Temporal needs to persist the user-specified time zone alongside the ICU time zone (unless it computes the ICU time zone on the fly when it is needed).
My concern is that implementations have had years to do this coordination... and haven't done it. With Temporal V1 we have a one-time opportunity to reduce churn in the ecosystem forever... and from what I've seen coming down the road from IANA, avoiding the whole "what's the right canonical ID?" question forever (at least for Temporal) seems appealing.
I don't think Temporal is the right vehicle to force this type of ecosystem change. Temporal is already a really tall order for implementations. I do hope that implementations would be more amenable to solving the problem if there were a future proposal narrowly focused on this problem space.
Sharing more stuff I've learned: CLDR metadata, not IANA TZDB, is currently the source of time zone canonicalization mappings in ECMAScript engines, per this comment:
From ICU’s point of view, which one is main one, and which one is specified by Link - is not important, because we don’t really expose the zoneinfo data directly to API. CLDR defines a set of “canonical zone IDs” for stability reason - and for example, both Europe/Berlin and Europe/Oslo are “canonical” zones. We don’t handle them one is an alias of another.
I think this means that we don't really care that much about the TZDB fork, as long as:
The last bullet is a problem! Currently the spec says this:
- If
ianaTimeZone
is a Link name, letianaTimeZone
be the String value of the corresponding Zone name as specified in the filebackward
of the IANA Time Zone Database.If ianaTimeZone
is "Etc/UTC" or "Etc/GMT", return "UTC".
This language, combined with other spec text encouraging use of the latest TZDB, will force implementers to use IANA's canonicalization strategy because the spec text is very prescriptive about use of backward
which now (at least in the default IANA build) aggressively merges.
If we do want engines (and not Temporal) to decide how canonicalization should work, then this spec text needs to change. Right?
Yeah, it makes a lot of sense to solve this in the section of 402 you're pointing to. I think there's already an issue open for it.
Given that this is already visible in 402, should Temporal be concerned with this issue specifically? Implementations already manage to choose to do something or other. We should just make sure that, whatever the result is, we apply it to 402 and Temporal equally.
Yeah, it makes a lot of sense to solve this in the section of 402 you're pointing to. I think there's already an issue open for it.
@sffc Are you thinking of https://github.com/tc39/ecma402/issues/272? That issue seems a bit wider than just canonicalization, although it touches on some of the same questions.
Given that this is already visible in 402, should Temporal be concerned with this issue specifically?
@littledan Currently the only way to know the canonical ID is quite hard to discover: DateTimeFormat.p.resolvedOptions().timeZone
and has very limited impact because localization output doesn't vary by alias. Unless developers are specifically poking into that API, canonicalization won't affect them at all.
In a Temporal world, canonical IDs will be highly visible in output of ZonedDateTime.p.toString
, ZonedDateTime.timeZoneId
, and TimeZone.p.id
. These strings will be used in comparison logic, will be stored in logs and databases, and developers will (rightly or not) probably expect them to be the same over time.
So although canonicalization exists in 402 today, it will have a lot more visibility and impact once Temporal ships in engines. Hence my concern!
Disagree on this; custom time zones should be compared by referential object identity.
@gibson042 After #2482, if an object is in a ZDT's [[TimeZone]] slot, will we know if it's a custom zone or not? I'm OK to use Object.is
to compare custom time zones as long as built-in time zone objects can still use the built-in comparison behavior. I do think it's a slippery slope though. If I subclass TimeZone
in order to add a new method but don't change any of the built-in behavior, would I break equals
? I'd also be OK with simply using id
, e.g. if CLDR knows the ID then canonicalize it, otherwise just compare the string as-is. I don't have a strong opinion here.
Based on discussion above, and given CLDR's synonym-only canonicalization strategy, I think we can narrow the decision to two basic choices below.
Note that neither option requires any change to ICU or CLDR.
A. Status quo: Follow Links + change 402 to codify existing CLDR practice.
Asia/Calcutta
.CanonicalizeTimeZoneName
to permit (require?) use of CLDR instead of IANA data.Pro: Less spec churn; Somewhat easier to implement. Con: Changing canonical aliases will be much less web-compatible.
B. Don't follow non-UTC Links when exposing time zone identifiers from Temporal objects
ZonedDateTime.timeZoneId
, TimeZone.p.id
, and toString
/toJSON
of both types would return the original identifier, normalized to the case present in AvailableTimeZones
. (Case normalization is needed so that implementations can store a <10-bit enumeration instead of the user's input string.) TimeZone.p.equals
, using the same algorithm that ZonedDateTime.p.equals
uses to compare time zones.DateTimeFormat.p.resolvedOptions().timeZone
and ZonedDateTime.p.equals
. The general principle is that we retain (and reflect back) the identifier the caller provided, but when we want to *act* on that identifier by testing for equality, by doing math, by resolving options used for localization, or when emitting localized text, then Link following happens because all CLDR aliases act the same.Pro: better web compatibility
Con: More spec churn; Somewhat harder to implement.
In other words, if we went with option 3 now, we could adopt options 1 or 2 (or even 4) later.
Unfortunately, I don't think that (B) above is possible in a V2. For example, it would not be web-compatible to stop considering Asia/Calcutta and Asia/Kolkata as equivalent in ZonedDateTime.p.equals
.
A. Status quo: Follow Links + change 402 to codify existing CLDR practice.
* Implementations continue using CLDR, not IANA TZDB, to decide canonicalization.
Firefox doesn't use CLDR time zone canonicalisation, but IANA canonicalisation (including backzone
) to follow ECMA-402 more closely, which only mentions IANA, but not CLDR.
The overrides are in https://searchfox.org/mozilla-central/source/js/src/builtin/intl/TimeZoneDataGenerated.h.
* Change [`CanonicalizeTimeZoneName`](https://tc39.es/proposal-temporal/#sup-canonicalizetimezonename) to permit (require?) use of CLDR instead of IANA data.
CLDR has a stable time zone id policy, which can be problematic for some time zone ids. For example Europe/Kiev
is forever the canonical id for Europe/Kyiv
. This can lead to endless browser bug reports, similar to what happened for years on the IANA tz data mailing list. https://en.wikipedia.org/wiki/KyivNotKiev has more background information on this topic.
Yeah, it makes a lot of sense to solve this in the section of 402 you're pointing to. I think there's already an issue open for it.
@sffc Are you thinking of https://github.com/tc39/ecma402/issues/272? That issue seems a bit wider than just canonicalization, although it touches on some of the same questions.
https://github.com/tc39/ecma402/issues/272#issuecomment-423928522 has a link to this old bug report from bugs.ecmascript.org: https://tc39.es/archives/bugzilla/1892/.
Some missing bits which aren't yet covered here:
new Intl.DateTimeFormat("en", {timeZone: "BET"})
should throw, because "BET"
is neither a valid IANA nor CLDR time zone id. So some sort of pre-/post-processing when using ICU is required anyway. (This is also an example where ICU differs from CLDR, e.g. SystemV time zones were removed from IANA in https://github.com/eggert/tz/commit/b3cf2ee42f0799e190c875f3af2ce6e5a7e287ce, ICU still keeps them as zones in icuzones, whereas CLDR uses links.)backzone
. For example new Intl.DateTimeFormat("en", {timeStyle: "full", timeZone: "Europe/Oslo"}).format(Date.UTC(1800, 0, 1))
returns "12:53:28 AM GMT+00:53:28"
. That's the offset for the IANA canonical time zone Europe/Berlin, Europe/Oslo has a different offset. The overall situations is more like:
backzone
).New_Zealand
, which can give the (false) impression that it's treated as equivalent to Pacific/Auckland
per the backward link from IANA. [1]There are probably more special cases, too. For example take Canada/East-Saskatchewan
: When using CLDR time zone information as the source of truth, TimeZoneIANANameComponent also needs to be changed to handle Canada/East-Saskatchewan
, because that id is still valid for CLDR/ICU, but was removed some time ago from IANA, because the name is too long (exceeds the fourteen characters limit).
[1] The meta zone mapping uses optional date information to handle the case when time zone rules change. When no date information is present, ICU restricts the range from 1970-01-01 to 9999-12-31, so it's best not to use dates more than fifty years in the past resp. dates too far into the future when testing this.
js> var dtf = new Intl.DateTimeFormat("en", {timeZone: "Antarctica/McMurdo", timeZoneName:"long"})
js> dtf.format(Date.UTC(1970, 0, 1))
"1/1/1970, New Zealand Standard Time"
js> dtf.format(Date.UTC(1970, 0, -1))
"12/30/1969, GMT+12:00"
js> dtf.format(Date.UTC(9999, 11, 31))
"12/31/9999, New Zealand Daylight Time"
js> dtf.format(Date.UTC(9999, 11, 31+1))
"1/1/10000, GMT+13:00"
Thanks, this is very useful info.
Firefox doesn't use CLDR time zone canonicalisation, but IANA canonicalisation (including
backzone
) to follow ECMA-402 more closely, which only mentions IANA, but not CLDR.
@anba - What is Firefox planning to do with the recent changes in IANA to merge unrelated zones together, for example, Europe/Stockholm
=> Europe/Berlin
and Atlantic/Reykyavik
=> African/Abidjan
? Are you planning to follow those links? Or are you planning to use the unmerged fork (https://github.com/JodaOrg/global-tz)? Or something else?
Once Temporal ships, these merges will be very problematic because time zone strings will be much more visible and will be persisted (e.g. in databases) and re-used far in the future. For example, imagine a calendar app that stores meeting times in a database using ZonedDateTime#toString
. There's no guarantee that 2024-07-01T09:00[Atlantic/Reykyavik]
and 2024-07-01T09:00[Africa/Abidjan]
will refer to the same point in time in 2024. If Iceland or Côte d'Ivoire changes their time zone, then attendees will show up at the wrong time.
Firefox examines the time zone information from backzone
, any time zone rule within backzone
will be treated as a canonical time zone id. Time zone links will also be canonicalised according to the information in backzone
. For example backzone
lists Atlantic/Reykjavik
as a time zone rule, so Firefox treats it as a canonical time zone id. The link from Iceland
will also canonicalised according to the backzone
info, i.e. it'll be canonicalised to Atlantic/Reykjavik
.
For Atlantic/Reykjavik
, this matches what ICU is already doing, therefore https://searchfox.org/mozilla-central/source/js/src/builtin/intl/TimeZoneDataGenerated.h doesn't include this mapping. (TimeZoneDataGenerated.h is generated by comparing the IANA rules and links, including backzone
, against the time zone rules and links from ICU. We don't compare against CLDR, because ICU sometimes doesn't match CLDR time zone definitions.) But for example Asia/Chongqing
is treated as a canonical time zone id, because there's a time zone rule for it in backzone
and Asia/Chungking
is canonicalised according to the backzone
link to Asia/Chongqing
. This doesn't match ICU, which treats both as links to Asia/Shanghai
(matching the definitions in backward
resp. common/bcp47/timezone.xml), therefore TimeZoneDataGenerated.h contains overrides to treat Asia/Chongqing
as a zone and Asia/Chungking
as a link to Asia/Chongqing
.
Using backzone
avoids some potential issues, for example Europe/Ljubljana
, Europe/Sarajevo
, Europe/Skopje
, and Europe/Zagreb
are no longer canonicalised to Europe/Belgrade
. Europe/Podgorica
is still canonicalised to Europe/Belgrade
, because there's no separate time zone rule for it in backzone
. But that case is probably is less complicated than the other cases, because there wasn't any open conflict between Serbia and Montenegro.
But just using backzone
also means we have entries like Europe/Tiraspol
as a canonical time zone id. Time zone transitions and date-time formatting will still handle it equivalent to Europe/Chisinau
, though.
That sounds like a good approach, and definitely better than the current main fork of TZDB. Do you know if what you're doing in FF varies from what https://github.com/JodaOrg/global-tz is doing? They sound quite similar.
From Temporal and 402 meetings 2023-03-09, we'll follow up on this issue in two ways:
In the meantime I'll close this issue to remove noise from the Temporal repo.
That sounds like a good approach, and definitely better than the current main fork of TZDB. Do you know if what you're doing in FF varies from what https://github.com/JodaOrg/global-tz is doing? They sound quite similar.
I think TZDB with backzone
is equivalent to global-tz with their backzone
file. I can't easily tell if global-tz without their backzone
is equivalent to TZDB with PACKRATLIST=zone.tab
, because I don't want to go through each line of https://github.com/JodaOrg/global-tz/blob/main/actions.txt to check the computed zones and links. The News file mentions that PACKRATDATA=backzone PACKRATLIST=zone.tab
gives the same results as global-tz, though.
The aforementioned Europe/Tiraspol
is an example where FF is different when compared against global-tz without their backzone
file.
If we want to do exact comparisons, it's necessary to explicitly define which configuration is tested:
PACKRATDATA
and PACKRATLIST
.backzone
?common/bcp47/timezone.xml
, or including <zoneAlias>
from common/supplemental/supplementalMetadata.xml
? Or the actual implementations in ICU4C, or ICU4J, or ICU4X? [1][1] It's likely that ICU4C and ICU4X will also have slightly different behaviour, because if ICU4X uses BCP-47 ids to store time zone ids, it can't represent the old and deprecated SystemV time zone ids, because those don't have a BCP-47 id. It could use <zoneAlias>
to treat them as links, but it'll still be slightly different when compared to ICU4C, which is still supporting them as actual time zones. (Support for SystemV time zones doesn't matter at all for real-world usage, but when doing exact comparisons it'd be good to define which differences can be ignored.)
While working on #2493, I learned that the IANA Time Zone Database has been forked due to a disagreement between that database's maintainer and some prominent users of the database.
Background
The two forks differ as follows:
Europe/Copenhagen
=>Europe/Berlin
andAtlantic/Reykyavik
=>African/Abidjan
. There are many more examples like this. This fork is preferred by the TZDB maintainer, and therefore is exposed by the official IANA downloads of TZDB releases.PACKRATLIST
build option. That build option was added by the maintainer to ensure that both forks could be built out of the same repo. See discussion here and here.You can read more about the fork in the TZDB mailing list archives. A few relevant threads:
The fork seems to represent a philosophical difference about the purpose of the TZDB. One camp (which includes the maintainer) sees the goal of TZDB as simply providing a way to convert post-1970 zoned timestamps into exact instants, and wants to reduce the TZDB size and maintenance hassle of dealing with pre-1970 data. The other camp (supporting the unmerged fork) adds additional use cases:
I'm not sure how much Temporal cares about pre-1970 dates, but the latter two issues seem quite important to Temporal users. The second one will make calendaring apps more resilient to country-level timezone/DST changes, while the third will prevent developer confusion and consternation.
Also, given the complaints about the changes, it's possible that the TZDB may revert these changes in the future, which would cause further churn.
Options
Anyway, now that we know this fork exists, we need to figure out what to do about it in the Temporal spec. Options include:
1. Recommend that implementers use the Primary Fork
2. Recommend that implementers use the Unmerged Fork
3. Don't recommend anything; implementers are free to choose.
4. Stop canonicalizing time zones (thanks to @pipobscure for this suggestion)
equals
method; avoids triggering geopolitical sensitivities caused by modifying user input point to an unexpected country or name.Temporal.TimeZone.equals
method to help users identify equivalent time zones like Asia/Calcutta vs. Asia/Kolkata; may require modifying existing ICU behavior (per this comment, it sounds like Firefox already does similar mods).Discussion
Of the above options, my strong preference is for (4), because it solves both the forking issue as well as the existing canonicalization issues like Calcutta vs. Kolkata. Also, I think retaining user input as-is will be quite helpful to reduce confusion in cases where code takes input from some other source, modifies that data, and then sends or stores the modified data. If the time zone identifier varies a lot between the original and modified ZDT, I think that will generate user confusion that avoiding canonicalization would prevent.
If we want to go with (4), here's a few questions to answer:
Intl.DateTimeFormat.p.resolvedOptions().timeZone
behave? Should it also stop canonicalizing? If yes, should it add a newcanonicalTimeZone
property?Intl.DateTimeFormat.p.format
orDate.p.toLocaleString
? I suspect that the answer is "no" because localized descriptions of time zones don't usually surface the IANA identifiers, but not 100% sure about this.Europe/Paris
vs.europe/paris
? My opinion: yes, we should canonicalize.Asia/Calcutta
vs.Asia/Kolkata
. My opinion: no, because by not canonicalizingid
in this case we can avoid user complaints like this chromium bug, and we can ensure future compatibility & round-trippability even if zones are renamed in the future. Note thatequals
should probably report these astrue
though. (See below.)TimeZone.p.equals
method? I think we should, both for consistency across Temporal types and to help code be robust in the face of past or future renames of cities which seems to happen fairly often globally. JS code should be able to ask "Is this date in the India time zone" without having to worry that that code will be broken by a past or future rename.equals
should we also add a method that tests if all rules are the same across time zones, e.g.Atlantic/Reykyavik
vs.Africa/Abidjan
? I don't think this is needed. Userland code can always usegetNextTransition
in a loop to check for this kind of equality, and if there's user demand we could always add it in a later release.Etc/UTC
should resolve toUTC
in ECMAScript, matching current behavior. There's no value in changing this existing behavior.PACKRATLIST
option to work, TZDB data must provide a way to differentiate "merged" links likeAtlantic/Reykyavik
=>Africa/Abidjan
from "renamed" links likeAsia/Calcutta
vs.Asia/Kolkata
. How does this differentiation work, and does is work for all links or are there gaps? It sounds like @anba may know how this works.If we add
equals
, here's a suggestion for its behavior:id
property.Europe/Paris
vs.europe/paris
.Asia/Calcutta
vs.Asia/Kolkata
, because they represent the same thing with different spelling.Etc/UTC
then treat them as equal.Atlantic/Reykyavik
vs.Africa/Abidjan
) as equal, even if all their time zone transitions are the same, because future changes could make those locations have different time zone rules. Per above, if users want to evaluate "all rules are the same" then can do this in userland by comparing time zone transitions in a loop. Although honestly I'm skeptical that this will be a popular use case. Who cares if the rules are equal?Pinging @jasonwilliams @ptomato @sffc @gibson042 @pipobscure for your opinions.