Open jarthod opened 1 year ago
I like your suggestions, it looks like a good plan to me
I'm in favor, but also unable to prioritize this work myself. Happy to review a PR. I'm particularly glad to see there's great conformance testing options.
@sporkmonger thanks for the feedback, I'm gonna write the PR for this after #492.
Forked from #408, this is the separate ticket to deal with the improvement of the default (pure ruby) IDNA implementation.
AFAICS the expectation is that while all registrar are upgrading to IDNA2008 and only allow valid hostnames (=approximately forever), browsers and web clients in general are encouraged to use IDNA2008+UTS#46 to widen support. So basically unless you're running a registrar, IDNA2008+UTS#46 is the target.
That's why
libidn2
is implementing IDNA2008 + UTS#46 and default to the Non-Transitional (=new) mode. Which is also used by curl for example and probably many other web clients. Firefox and Safari also seems to do IDNA2008+UTS#46 Non-Transitional. Chrome was lagging a bit as it was still using Transitional mode up until very recently, apparently they juuust changed this to Non-Transitional in Chome 110. I can't verify this yet as I only have Chome 109 on Linux ^^Edit (February 13th 2023): I just received Chrome 110 and confirmed the new behavior, http://faß.de now resolves to
http://xn--fa-hia.de
(and stays displayed as http://faß.de). Whereas in Chrome 109 it was transformed into http://fass.de (IDNA2003).libidn
(the current "native" option) implements IDNA2003 standard (the "older" one). IMO we should upgrade tolibidn2
, this will be discussed in #247.The "pure" implementation is IDNA2008iiiisssshhhhh, but not compliant. As we can see in this example with an emoji modifier:
If we compare that to the official Unicode test website):
https://xn--lh-t0xz926h.ws
(returned by current "pure" implementation) is not even an option, no matter what standard we use, it's eitherxn--lh-t0x.ws
or invalid (IDNA2008)In order to make the pure implementation up to the state of art, we'll have to rewrite some of it (or bring in a dependency). As I was looking at options for dependencies, I found:
unf
for unicode normalization though :neutral_face: which is not great, especially as ruby does this. It could be a good help for a rewrite though.Good news: the Unicode team provide some awesome comformance testing file with thousands of input string and the desired output for IDNA2008+UTS#46, for every version of Unicode, example: https://www.unicode.org/Public/idna/15.0.0/IdnaTestV2.txt
My suggestion here would be to go with an incremental rewrite in order to:
simpleidn
implementation)@sporkmonger @dentarg what do you think?