whatwg / url

URL Standard
https://url.spec.whatwg.org/
Other
532 stars 140 forks source link

IdnaTestV2.json "xn--xn--a--gua.pt" test case problem #803

Open domenic opened 11 months ago

domenic commented 11 months ago

What is the issue with the URL Standard?

When updating jsdom/tr46 to the latest revision of TR46, I implemented this new line of the label validity criteria:

If not CheckHyphens, the label must not begin with “xn--”.

This causes my test suite to fail due to these lines from IdnaTestV2.json:

  {
    "comment": "V2 (ignored)",
    "input": "xn--xn--a--gua.pt",
    "output": "xn--xn--a--gua.pt"
  },

which correspond to these lines from IdnaTestV2.txt:

xn--xn--a--gua.pt; xn--a-ä.pt; [V2]; xn--xn--a--gua.pt; ; ;  # xn--a-ä.pt

I can't tell whether this is a problem with the source data, or with our conversion script. The conversion script seems to be trying to do something with the V2 error codes, but I'm not sure exactly what.

Note that other V2 error codes in the test data don't seem to cause problems.

domenic commented 11 months ago

This is probably related to https://github.com/whatwg/url/issues/760

annevk commented 11 months ago

This is a new criteria in UTS 46 v31 vs v29. Contrast with https://www.unicode.org/reports/tr46/tr46-29.html#Validity_Criteria.

It's not entirely clear to me if all the changes made to UTS 46 are correct. Notably we were not consulted on them.

domenic commented 11 months ago

I think this one might have been a result of our requests, or at least an interpretation of our requests. See #760. In particular I think it might align with WebKit, and thus avoid the roundtripping problems others see.

rmisev commented 11 months ago

I think the problem is in IdnaTestV2.txt (from Unicode 15.1). The test you mention is missing the V4 label. There should be:

xn--xn--a--gua.pt; xn--a-ä.pt; [V2, V4]; xn--xn--a--gua.pt; ; ;  # xn--a-ä.pt

This is related to: #603