tc39 / ecma402

Status, process, and documents for ECMA 402
https://tc39.es/ecma402/
Other
538 stars 107 forks source link

IANA timezone db reference in the spec : should backzone be taken into account? #272

Closed jungshik closed 1 month ago

jungshik commented 6 years ago

The current spec keeps referring to 'Zone and Link names', but that's not sufficient and leads to a divergence between implementations.

The main question is whether or not to take into account 'backzone' file in the IANA timezone database.

The ECMAScript 2018 Internationalization API Specification identifies time zones using the Zone and Link names of the IANA Time Zone Database. Their canonical form is the corresponding Zone name in the casing used in the IANA Time Zone Database.

All registered Zone and Link names are allowed. Implementations must recognize all such names

Firefox uses zone and link names in 'backzone' file of the IANA tz db, but some links in 'backzone' file contradicts what's in other files.

backward file has the following:

Link    Asia/Shanghai           Asia/Chongqing
Link    Asia/Shanghai           Asia/Chungking

backzone file has the following:

Link Asia/Chongqing Asia/Chungking

Note that backzone file has the following comment at the top:

# This file contains data outside the normal scope of the tz database,
# in that its zones do not differ from normal tz zones after 1970.
# Links in this file point to zones in this file, superseding links in
# the file 'backward'.

Because Firefox takes into account 'backzone' file, 'Asia/Chungking' is canonicalized to 'Asia/Chongqing' instead of 'Asia/Shanghai'.

ICU/CLDR (as used by v8) ignores 'backzone' file and both 'Asia/Chungking' and 'Asia/Chongqing' are canonicalized to 'Asia/Shaghai' per 'backward' file.

CLDR/ICU, however, do not canonicalize 'Asia/Phnom_Penh' and 'Asia/Vientiane' to 'Asia/Bangkok' despite the following in 'asia' file:

Link Asia/Bangkok Asia/Phnom_Penh       # Cambodia
Link Asia/Bangkok Asia/Vientiane        # Laos

That's because IANA timezone DB relegated the two zone names to links rather recently (2014-2015) and CLDR/ICU do not want to destabilize the tz ID space. So, they kept them as canonical zone IDs.

anba commented 6 years ago

We've went with using the 'backzone' ids in Firefox to avoid the risk to make users in the affected time zones upset. For example canonicalizing 'Europe/Ljubljana', 'Europe/Podgorica', 'Europe/Sarajevo', 'Europe/Skopje', and 'Europe/Zagreb' to 'Europe/Belgrade' (which would be the case when not applying 'backzone') may have negative cultural/political effects.

Relevant comments in the Firefox bug tracker: https://bugzilla.mozilla.org/show_bug.cgi?id=1303091#c3, https://bugzilla.mozilla.org/show_bug.cgi?id=1303091#c9, https://bugzilla.mozilla.org/show_bug.cgi?id=1303091#c11

Unfortunately just using CLDR instead of IANA data can also lead to wrong canonicalizations, cf. https://unicode-org.atlassian.net/browse/ICU-12044 and http://unicode.org/cldr/trac/ticket/9892. In the CLDR bug, jungshik has also given an example where CLDR didn't update the mapping despite being outdated since 1993.

And the related (unresolved) bugs.ecmascript.org bug which also mentions the complications between selecting which tz links are safe to apply and which ones are more contentious: https://tc39.github.io/archives/bugzilla/1892/

And more related threads from the tzdata mailing list (mostly from 2013-2014 when many zones were moved to the backzone file): http://mm.icann.org/pipermail/tz/2014-July/021170.html, https://mm.icann.org/pipermail/tz/2013-September/019821.html, https://mm.icann.org/pipermail/tz/2014-November/021888.html.

jungshik commented 6 years ago

'Europe/Ljubljana', 'Europe/Podgorica', 'Europe/Sarajevo', 'Europe/Skopje', and 'Europe/Zagreb' to 'Europe/Belgrade'

Thanks a lot for alerting me about those entries and references to TZ mailing list threads on the topic. A similar sentiment may exist about canonicalizing Asia/Phnom_Penh and Asia/Vientiane to Asia/Bangkok.

Unfortunately just using CLDR instead of IANA data can also lead to wrong canonicalizations,

Yup, you're right. I'm aware of the issue because CLDR sticks to pretty old IDs that had been deprecated well before CLDR project started. (Calcutta vs Kolkata, Saigon vs Ho_Chi_Minh, Katmandu vs Kathmadu and many others).

@yumaoka

ryzokuken commented 4 years ago

@sffc I'm unsure what needs to be done here. Could you tell me what the web reality is? IIUC, Firefox now uses ICU too, but did ICU ever end up taking this into account and start using the backzone file?

sffc commented 4 years ago

@anba What is Firefox doing these days? Is it still necessary to put in the exception to allow backzone to be used for time zone names?

sffc commented 4 years ago

Is there a snippet of code that can reproduce the Firefox/Chrome discrepancy? It appears that Asia/Chongqing and Asia/Shanghai are equivalent in modern times, but may have differed at some time in the past, perhaps before China decided to unify under one time zone.

I wrote the following code in my best attempt to reproduce the difference, but was unsuccessful in finding a difference:

new Date(1945, 0, 1).toLocaleString("en", { timeZone: "Asia/Chongqing", timeZoneName: "long" })
// "1/1/1945, 2:00:00 PM GMT+09:00"
new Date(1945, 0, 1).toLocaleString("en", { timeZone: "Asia/Shanghai", timeZoneName: "long" })
// "1/1/1945, 2:00:00 PM GMT+09:00"
jungshik commented 4 years ago

IIUC, Firefox now uses ICU too, but did ICU ever end up taking this into account and start using the backzone file?

I think Firefox still uses a rather large override map (to take care of cases mentioned in this issue) on top of ICU. Firefox already used ICU when this issue was filed, btw. :-)

CLDR has a policy on the ID stability and it's a bit hard to change that, I'm afraid. Given this, I was thinking of what Firefox does in v8 to handle 'Saigon => Ho_Chi_Minh', 'Calcutta => Kolkata', etc, but held it off because I wanted it to be resolved at the CLDR so that v8 does not need a local override map [1]. My (dim) hope for a possible CLDR change was based on my 'findings' that turned out to be false. See below.

As for using 'backzone' (this issue), it's related but a bit different.

Unfortunately just using CLDR instead of IANA data can also lead to wrong canonicalizations, cf. https://unicode-org.atlassian.net/browse/ICU-12044 and http://unicode.org/cldr/trac/ticket/9892. In the CLDR bug, jungshik has also given an example where CLDR didn't update the mapping despite being outdated since 1993.

And, unfortunately, my claim turned out to be false. I thought 'Asia/Calcutta' had been changed to 'Asia/Kolkata' well before the CLDR project started. In https://unicode-org.atlassian.net/browse/CLDR-9892, @yumaoka dug up the historic IANA timezone files and found that as lately as 2008 (well after the CLDR project started) had 'Asia/Calcutta' instead of 'Asia/Kolkata'. He suspected that the same was true of 'Saigon vs Ho_Chi_Minh' and 'Katmandu vs Kathmandu'.

[1] To make things complicated, there's a possibility that the override map needs to be duplicated for Chrome OS, which was yet another reason I wanted it to be resolved in CLDR. An alternative of changing the ICU data locally for Chromium was not desirable, either because that'd make the TZ db update process more complicated (although it may not be that bad).

jungshik commented 4 years ago

The repro step is as following:

new Intl.DateTimeFormat("en", {timeZone:"Asia/Chongqing"}).resolvedOptions().timeZone

"Asia/Chongqing" : Firefox "Asia/Shanghai" : Chrome

jungshik commented 4 years ago

Without underlying zoneinfo files supporting the historical difference between Asia/Chongqing and Asia/Shanghai, I think it's all but pointless to treat them as separate zones.

Below is what Firefox does with my computer timezone set to America/Los_Angeles. Note that Asia/Chongqing and Asia/Shanghai had different local mean time (they have different longitudes), but the result is the same. The same holds for Asia/Bangkok vs Asia/Phnom_Penh.

new Date(1850,0,1).getTimezoneOffset()
472.96666666666664.   # In 1850, LMT was used everywhere including America/Los_Angeles 
new Date(1850,0,1).toLocaleString("en")
"1/1/1850, 12:00:00 AM"
new Date(1850,0,1).toLocaleString("en", {timeZone: "UTC"})
"1/1/1850, 7:52:58 AM"

new Date(1850,0,1).toLocaleString("en", {timeZone: "Asia/Shanghai"})
"1/1/1850, 3:58:41 PM"
new Date(1850,0,1).toLocaleString("en", {timeZone: "Asia/Chongqing"})
"1/1/1850, 3:58:41 PM"

new Date(1850,0,1).toLocaleString("en", {timeZone: "Asia/Phnom_Penh"})
"1/1/1850, 2:35:02 PM"
new Date(1850,0,1).toLocaleString("en", {timeZone: "Asia/Bangkok"})
"1/1/1850, 2:35:02 PM"
anba commented 4 years ago

There are multiple issues, some overlapping, which lead to differences between browsers when handling time zones:

  1. Let's start with accepted time zone strings, because any difference here may have side-effects later on.

    • SM: Uses ICU to validate time zone names, but has an extra mapping to reject non-IANA names (e.g. ICU legacy time zones like ACT or previous IANA names like Canada/East-Saskatchewan). Also disallows SystemV time zones, which are disabled by default in tzdata.
    • V8: Uses a simple parser to validate time zone names before passing them to ICU. The parser rejects legacy ICU time zones, but still allows Canada/East-Saskatchewan, even though that one is no longer valid per IANA (but still valid in CLDR!). Recently an extra mapping was added to handle more cases. The parser also rejects SystemV time zones, but it's not clear to me if that's intentional or just a happy coincidence.
    • JSC: Directly calls into ICU to validate time zone names. That leads to accepting legacy ICU names and SystemV time zones.
  2. Canonicalisation differences between IANA and CLDR for same time zones:

    • SM: Contains extra mappings to override ICU to make sure IANA mappings are applied.
    • V8/JSC: Directly return whatever ICU returns.
    • Examples:
      • "America/Buenos_Aires" links to "America/Argentina/Buenos_Aires" in IANA, but it's the other way around in CLDR.
      • "EST" is its own zone in IANA, but a link to "Etc/GMT+5" in CLDR.

Now let's go over to the backzone file. First, as a quick reminder, ICU doesn't contain any data for backzone time zones!

  1. backzone time zones which ICU claim to support, because they're CLDR time zones:
    • SM/V8/JSC: Accepted in all browsers, but giving the wrong results.
    • Example:
      • "Europe/Sarajevo" (and other links to "Europe/Belgrade") is reported to the user as a time zone, but contains the time zone data for "Europe/Belgrade".
        js> var date = new Date("1800-01-01T00:00:00Z")
        js> var dtf = new Intl.DateTimeFormat("en", {timeZone:"Europe/Belgrade", hour:"2-digit", minute:"2-digit"})
        js> dtf.format(date)                                                                                        
        "1:22 AM"
        js> dtf.resolvedOptions().timeZone
        "Europe/Belgrade"
        js> var dtf = new Intl.DateTimeFormat("en", {timeZone:"Europe/Sarajevo", hour:"2-digit", minute:"2-digit"}) 
        js> dtf.format(date)                                                                                        
        "1:22 AM"
        js> dtf.resolvedOptions().timeZone                                                                          
        "Europe/Sarajevo"

Rules for "Europe/Belgrade" and "Europe/Sarajevo"

# Zone  NAME        STDOFF  RULES   FORMAT  [UNTIL]
Zone    Europe/Belgrade 1:22:00 -   LMT 1884
Zone    Europe/Sarajevo 1:13:40 -   LMT 1884

CLDR lists "Europe/Sarajevo" as a time zone, not a link:

<type name="basjj" description="Sarajevo, Bosnia and Herzegovina" alias="Europe/Sarajevo"/>
  1. backzone zones which are links in CLDR.

    • SM: Reported as its own zone.
    • V8/JSC: Canonicalised to a different zone.
    • Example: "Asia/Chongqing".
      • SM: "Asia/Chongqing" reported as the canonical name, but returns the data for "Asia/Shanghai".
      • V8/JSC: Canonicalised to "Asia/Shanghai".
  2. backzone zones which are different links in CLDR.

    • SM: Links to the zone in backzone
    • V8/JSC: Links to zone outside of backzone.
    • Example: "Asia/Chungking".
      • SM: Canonicalised to "Asia/Chongqing", but returns the data for "Asia/Shanghai".
      • V8/JSC: Canonicalised to "Asia/Shanghai".
anba commented 4 years ago

@anba What is Firefox doing these days? Is it still necessary to put in the exception to allow backzone to be used for time zone names?

We're basically still in the same position as when we've originally implemented these overrides for backzone. Objectively speaking, we're returning the wrong data for pre-1970 time stamps for backzone zones, but are reluctant to canonicalise to the zones whose data is used for the reasons outlined in https://github.com/tc39/ecma402/issues/272#issuecomment-423928522.

jungshik commented 4 years ago

@anba, thank you for the summary as well as the reminder about 'Z' in Date ctor that I forgot.

I also forgot what I wrote about {Asia/Phnom_Penh and Asia/Vientiane} vs Asia/Bangkok . They have the same issue as Europe/Sarajevo and Europe/Belgrade. That is, the third issue in @anba's comment.

jungshik commented 4 years ago

The parser also rejects SystemV time zones, but it's not clear to me if that's intentional or just a happy coincidence.

That's intentional omission. The V8 tzname parser rejects everything that is not explicitly handled. SystemV zone name handling is omitted on purpose because it's disallowed.

FYI, @FrankYFTang

sffc commented 1 year ago

@justingrant Thoughts on this issue?

justingrant commented 1 year ago

AFAIK, the current plan is for CLDR and ICU to resolve the issues discussed in this thread:

  1. A new iana attribute has been added to https://github.com/unicode-org/cldr/blob/main/common/bcp47/timezone.xml, which (after https://unicode-org.atlassian.net/browse/ICU-22452 is implemented soon) will allow ICU clients to fetch the latest canonical ID for IDs like Asia/Calcutta (canonical ID is Asia/Kolkata) and Europe/Kiev (canonical ID is Europe/Kyiv).
  2. The CLDR data has been cleaned up so that the list of canonical IDs is similar to what's found by using backzone, and which should be very close (modulo a few corner cases and judgement calls) to the results you'd get when building TZDB with PACKRATDATA=backzone PACKRATLIST=zone.tab.
  3. There are a few IANA IDs that are missing from timezone.xml, like "EST" and "PST8PDT. These will be added to timezone.xml in a later PR.

Once this CLDR and ICU work is completed and released, we have a choice to make:

A while ago I filed #825 to encourage a decision on the choice of (A), (B), or (C).

Unless there are objections, I think that this issue can be closed as a dupe of that one?

sffc commented 6 months ago

Does #877 fix this issue?

justingrant commented 6 months ago

Does #877 fix this issue?

Yes, it resolves the questions raised by this issue.

ben-allen commented 1 month ago

Closing as resolved by #877