medialize / URI.js

Javascript URL mutation library
http://medialize.github.io/URI.js/
MIT License
6.26k stars 476 forks source link

Emoji is incorrectly encoded in punycode #368

Open xPaw opened 6 years ago

xPaw commented 6 years ago
new URI("https://🤦‍♂️.xpaw.me").normalize().hostname()
> "xn--1ug66vku9rd58h.xpaw.me"

Unicode inspector: https://apps.timwhitlock.info/unicode/inspect?s=%F0%9F%A4%A6%E2%80%8D%E2%99%82%EF%B8%8F

However Chrome and https://www.punycoder.com/ encode it as https://xn--g5hz781o.xpaw.me/

What's happening here?

xPaw commented 6 years ago

Chrome and Edge drop ZERO WIDTH JOINER and VARIATION SELECTOR-16 from the punycode which ends up as xn--g5hz781o.

Firefox only drops ZWJ which ends up xn--1ug66v4685b.

Looking at this: https://tools.ietf.org/html/rfc5894#section-7.2.2 dropping ZWJ is correct, however there's no word about variation selectors.

rodneyrehm commented 6 years ago

Unfortunately I have no idea how emojis in domains should behave.

We could try updating punycode to 1.4.1, currently we're using 1.4.0. unfortunately 2.0.0 seems to have dropped legacy browser support.

xPaw commented 6 years ago

It basically seems that IDNA rules should be followed before the domain is turned into punycode - https://unicode.org/reports/tr46/

I have a test page on https://xn--g5hz781o.xpaw.me/ which I did to test various browsers.

punycode.js doesn't seem to implement it sadly:

There is https://github.com/jcranmer/idna-uts46 which could probably solve the problem here, but that library is crazy big.

rodneyrehm commented 6 years ago

maybe @mathiasbynens has thoughts on this?

n4ru commented 2 years ago

Chrome and Edge drop ZERO WIDTH JOINER and VARIATION SELECTOR-16 from the punycode which ends up as xn--g5hz781o.

Firefox only drops ZWJ which ends up xn--1ug66v4685b.

Looking at this: https://tools.ietf.org/html/rfc5894#section-7.2.2 dropping ZWJ is correct, however there's no word about variation selectors.

Was there a conclusion regarding whether or not variation selectors should be dropped?

jarthod commented 1 year ago

For the record, the latest idnaMappingTable (Unicode v15) seems to say the variation selectors should be ignored/dropped:

FE00..FE0F    ; ignored                                # 3.2  VARIATION SELECTOR-1..VARIATION SELECTOR-16