jsdom / whatwg-url

An implementation of the WHATWG URL Standard in JavaScript
https://jsdom.github.io/whatwg-url/
MIT License
371 stars 94 forks source link

Idna v2 #250

Closed domenic closed 1 year ago

domenic commented 1 year ago

This does not currently work; /cc @annevk @karwa to see the failures. I suspect they're more bugs in the tr46 or punycode JS modules, but it's hard to be sure... tr46 is supposedly passing all the IdnaTestV2.txt tests, but maybe check out our test harness at https://github.com/jsdom/tr46/blob/master/test/unicode.js to see if something's going wrong in either of the two test harnesses?

annevk commented 1 year ago

You need to percent-encode the input (I pushed a change for this to my web-platform-tests PR). That will reduce errors due to ? being used. That will still leave some errors however that I think will be hard to filter out due to the source data not having annotations as detailed as we'd like. I could probably improve upon it some more though. Perhaps with more effort and some minimal special casing we could use all tests as-is and with a future update remove some of the special casing.

domenic commented 1 year ago

Now most (all?) of the failures seem to come from empty labels, e.g. xn--4-0bd15808a. or .xn--ye6h or xn--09e4694e..xn--ye6h. Are these all the BiDi case you mentioned? I can't find any spec text that prohibits empty labels, but maybe I'm not looking in the right place?

annevk commented 1 year ago

It's not prohibited, but UTS46 runs https://www.rfc-editor.org/rfc/rfc5893#section-2 on each label and the first subrule there assumes the label to be non-empty. From @TimothyGu's comment it sounds like the JS Punycode library runs into that.

domenic commented 1 year ago

So is the intent of merging https://github.com/web-platform-tests/wpt/pull/38080 to make it a de-facto requirement to prohibit such empty labels, i.e., any implementation which wants to pass the WPT suite must add some prohibition on them?

Here's our implementation, FWIW: https://github.com/jsdom/tr46/blob/13d02e630ad6da5ab974f4adfc7c4afb6b44c619/index.js#L132

annevk commented 1 year ago

No, xn--4-0bd15808a. parses in WebKit and WebKit passes all tests. I suspect what your code is tripping on is the fundamental bug in UTS46 around empty labels. I suppose we could try to remove those tests as well, but those don't appear to be a problem for browsers.

domenic commented 1 year ago

Hmm OK. So do we have feedback lined up for this "fundamental bug"? And, are we just saying that the correct fix for that bug, is to allow such empty labels?

annevk commented 1 year ago

See the second item in https://github.com/whatwg/url/issues/744.

Because we don't pass VerifyDnsLength browsers allow empty labels. It might be worth revisiting that, but that's the status quo. E.g., https://./ and https://../ are confusingly correct. (I was very surprised by this a couple days ago too, but since browsers don't exclusively talk over DNS networks it might be reasonable. Not entirely convinced though.)

domenic commented 1 year ago

After https://github.com/jsdom/tr46/pull/39, there is one remaining failure case. I believe it is caused by https://github.com/mathiasbynens/punycode.js/issues/129.