whatwg / url

URL Standard
https://url.spec.whatwg.org/
Other
533 stars 139 forks source link

Update to Unicode 16.0.0 #836

Open rmisev opened 2 months ago

rmisev commented 2 months ago

What is the issue with the URL Standard?

Version 16.0.0 (2024-08-30) of Unicode Technical Standard #46 has been released. It fixes some previously reported issues:

  1. https://github.com/whatwg/url/issues/760 The Processing 4.3. step (after Punycode decode) fixes this issue:

    If the label is empty, or if the label contains only ASCII code points, record that there was an error.

  2. https://github.com/whatwg/url/issues/803 The test in question is now correctly labeled in IdnaTestV2.txt:
    xn--xn--a--gua.pt; xn--a-ä.pt; [V2, V4]; xn--xn--a--gua.pt; ; ;  # xn--a-ä.pt

So I think it's worth upgrading to that standard:

  1. Reference the new 16.0.0 Unicode Technical Standard #46 in the Normative References section.
  2. In the WPT update the IdnaTestV2-parser.py tool and the IdnaTestV2.json test file. I have opened a PR for this: https://github.com/web-platform-tests/wpt/pull/48301
annevk commented 1 day ago

@markusicu @macchiati could you provide some context for these changes? While we submitted a bunch of feedback (as recorded in #744) it seems there's quite a few other changes as well.

E.g., is an invalid domain name today, but with Unicode 16 would be valid?

(We are seeing this in WebKit as well now: https://github.com/WebKit/WebKit/pull/37104.)

markusicu commented 20 hours ago

We considered several issues and made recommendations that the UTC approved.

For example, there were complicated and unnecessary differences in processing with UseSTD3ASCIIRules=true vs. false, and some characters were disallowed based on differences between IDNA2003 and IDNA2008, while (a) IDNA2003 has not been relevant in a long time and (b) transitional processing had been deprecated in Unicode 15.1.

For details see