tc39 / ecma262

Status, process, and documents for ECMA-262
https://tc39.es/ecma262/
Other
14.99k stars 1.28k forks source link

Issues with normative references to Unicode spec. #726

Open allenwb opened 7 years ago

allenwb commented 7 years ago

While reviewing some Unicode related proposals I developed some concerns about how ECMA-262 currently references Unicode related standards:

In ECMAScript 2016 we switch to an undated "current" edition usage of the Unicode standard. However, the normative reference in clause 3 is still a dated reference to "ISO/IEC 10646:2003" plus assorted amendments. Also, the title listed for that ISO standard is also obsolete. The normative reference should be:

ISO/IEC 10646 Information Technology — Universal Coded Character Set (UCS)

Unfortunately ISO/IEC 10646 published by ISO and The Unicode Standard published by the Unicode Consortium are different documents. There is material in the The Unicode Standard which is not included in ISO/IEC 10646 but which is "indispensable" for the application of ECMA-262. However, neither the The Unicode Standard nor its relevant related documents are included in clause 3. Instead they are listed in ECMA-262 Bibliography.

The reason for this is that ECMA (and ISO) apparently prefer to only normatively reference documents published by ISO recognized organizations. Apparently the Unicode Consortium documents are in the Bibliography because it was assumed that they don't meet the criteria to be a normative reference. But this assumption is easy to disprove. As shown in the following image, ISO/IEC 10646 itself normatively references Unicode Consortium documents:

4th-10646-00-main_pdf__page_10_of_146_

If ISO/IEC 10646 can normatively reference Unicode Consortium documents then ECMA-262 also can. Subclauses 11.6 and 21.1.3.10 and perhaps other subclauses have "indispensable" dependencies upon Unicode Consortium documents that are currently in the Bibliography. The indispensable documents should be moved to Clause 3 and the language in the dependent subclauses may need to be adjusted accordingly.

bterlson commented 7 years ago

Good finds!

bterlson commented 7 years ago

What wording do you think would need to be updated? I don't see any inbound references to sec-bibliography and dependent clauses refer to the reference by name (eg. "Unicode Standard").

bterlson commented 7 years ago

I also note that all the non-Unicode Standard references are found in a note in String#localeCompare. What is indispensable about a clause that contains a note with a reference? Moving the Unicode Standard makes sense but I'm not sure about the others.

allenwb commented 7 years ago

As noted in https://github.com/mathiasbynens/es-regexp-unicode-property-escapes/issues/13 I noticed this while reviewing https://github.com/mathiasbynens/es-regexp-unicode-property-escapes which needs to added additional such references. Seems like a good reason to get a normative reference act together.

Other places where we have a missing or improper normative reference to a Unicode doc:

bterlson commented 7 years ago

@allenwb in terms of wording updates after those references are moved to normative references, what do you want to see? Is it fine to refer to it by standard name (eg. "UAX #15 Unicode Normalization Forms") if that wording is used under the normative references clause?

allenwb commented 7 years ago

Well here is how the ISO version of the Unicode standards references such things:

Normalization forms are the mechanisms allowing the selection of a unique coded representation among alternative; but equivalent coded text representations of the same text. Normalization forms for use with this International Standard are specified in the Unicode Standard UAX#15 (see Clause 3) and shall be used in the context of this International Standard. There are four normalization forms:

and their clause 3 has the following normative reference: Unicode Standard Annex, UAX #15, Unicode Normalization Forms:
http://www.unicode.org/reports/tr15/tr15-41.html.

I think the parenthetical "see Clause 3" is a little bit much. Overall, I think your formulation (UAX #15 Unicode Normalization Forms) is probably fine and a bit more useful.