Closed domenic closed 1 year ago
You need to percent-encode the input (I pushed a change for this to my web-platform-tests PR). That will reduce errors due to ?
being used. That will still leave some errors however that I think will be hard to filter out due to the source data not having annotations as detailed as we'd like. I could probably improve upon it some more though. Perhaps with more effort and some minimal special casing we could use all tests as-is and with a future update remove some of the special casing.
Now most (all?) of the failures seem to come from empty labels, e.g. xn--4-0bd15808a.
or .xn--ye6h
or xn--09e4694e..xn--ye6h
. Are these all the BiDi case you mentioned? I can't find any spec text that prohibits empty labels, but maybe I'm not looking in the right place?
It's not prohibited, but UTS46 runs https://www.rfc-editor.org/rfc/rfc5893#section-2 on each label and the first subrule there assumes the label to be non-empty. From @TimothyGu's comment it sounds like the JS Punycode library runs into that.
So is the intent of merging https://github.com/web-platform-tests/wpt/pull/38080 to make it a de-facto requirement to prohibit such empty labels, i.e., any implementation which wants to pass the WPT suite must add some prohibition on them?
Here's our implementation, FWIW: https://github.com/jsdom/tr46/blob/13d02e630ad6da5ab974f4adfc7c4afb6b44c619/index.js#L132
No, xn--4-0bd15808a.
parses in WebKit and WebKit passes all tests. I suspect what your code is tripping on is the fundamental bug in UTS46 around empty labels. I suppose we could try to remove those tests as well, but those don't appear to be a problem for browsers.
Hmm OK. So do we have feedback lined up for this "fundamental bug"? And, are we just saying that the correct fix for that bug, is to allow such empty labels?
See the second item in https://github.com/whatwg/url/issues/744.
Because we don't pass VerifyDnsLength browsers allow empty labels. It might be worth revisiting that, but that's the status quo. E.g., https://./
and https://../
are confusingly correct. (I was very surprised by this a couple days ago too, but since browsers don't exclusively talk over DNS networks it might be reasonable. Not entirely convinced though.)
After https://github.com/jsdom/tr46/pull/39, there is one remaining failure case. I believe it is caused by https://github.com/mathiasbynens/punycode.js/issues/129.
This does not currently work; /cc @annevk @karwa to see the failures. I suspect they're more bugs in the tr46 or punycode JS modules, but it's hard to be sure... tr46 is supposedly passing all the IdnaTestV2.txt tests, but maybe check out our test harness at https://github.com/jsdom/tr46/blob/master/test/unicode.js to see if something's going wrong in either of the two test harnesses?