jsdom / tr46

An implementation of the Unicode UTS #46: Unicode IDNA Compatibility Processing.
MIT License
31 stars 12 forks source link

Stripping the FE0F Variation selector 16 modifier when resolving, causing major issues in the cryptocurrency space #35

Closed thepwnman33 closed 2 years ago

thepwnman33 commented 2 years ago

I believe there is a discrepancy that is not allowing for standardizing normalisation between several large platforms and markets in the space

It appears someone is not applying the stand in compliance or doing things slightly different most specifically the library strips FE0F code away when resolving displaying from a few native emojis with joiners and FE0F

https://unicode.org/Public/emoji/14.0/emoji-test.txt According to this, FE0F should be preserved to be "fully qualified", which is theoretically the more adaptive version, this is currently causing major issues with decentralized emoji domains (.eth) in Opensea , Etherscan and Metawallet, everyone is using a different library and there is no standardization) more specifically Metawallet (which uses this library) resolves differently than Opensea and Etherscan

Taking the example of emoji (genie male) taken directly from unicode.org

1F9DE 200D 2642 FE0F ; fully-qualified # 🧞‍♂️ E5.0 man genie

If you were to take the man genie 🧞‍♂️ ( Hand held devices/Discord/Twitter etc etc will default to 1F9DE 200D 2642 FE0F) large markets like Opensea and Etherscan too. However after this script runs, the strip takes place, we are left with 1F9DE 200D 2642 It strips the FE0F from the man genie turning it into a minimally-qualified emoji. This is disrupting all of the emoji .eth domains, their connection with Metawallet addresses.

It would appear the wisest solution would hopefully try to apply the stand and more widely supported emoji version which in this case is the fully-qualified 1F9DE 200D 2642 FE0F the variation selector 16 modifier shouldn't be stripped away, as its responsible for current and likely future generations of aesthetically different emojis while being the arguably better choice for Unicode stand

Regards

TimothyGu commented 2 years ago

This is not something we can change unilaterally, as the Unicode standard specifies that variation selectors should be ignored in IDNA processing:

FE00..FE0F    ; ignored                                # 3.2  VARIATION SELECTOR-1..VARIATION SELECTOR-16

Furthermore, the example you provided uses U+200D Zero Width Joiner, which is only allowed in Unicode labels under very specific circumstances.

I have personally never heard of Opensea, Etherscan, or Metawallet, but it sounds like they should all be made follow the Unicode IDNA standard that literally everyone else uses. There are several other implementations of IDNA, including Node.js, web browsers, ICU4C, Python's idna, and Go's x/net/idna. All of them (assuming the right options are set: utf46=True for Python's idna, or idna.Lookup for Go's x/net/idna) behave correctly. As an example, running

new URL('http://\u{1F9DE}\u2642\uFE0F.eth').href === new URL('http://\u{1F9DE}\u2642.eth').href

returns true in Chrome, Firefox, and Safari.

TimothyGu commented 2 years ago

To gain some intuition why variation selectors are forbidden, think of them as follows. We are dealing with domain names (well technically domain labels), which are modeled as a kind of "plain text". People are generally used to AbC.com being transformed to abc.com, but more exotically, 1ªleydenewton.mx maps to 1aleydenewton.mx, and ﷼.ir (containing the symbol for the currency rial) is mapped to ریال.ir (where the symbol is decomposed into individual letters).

By the same token, it makes sense for "optional" (for text at least) annotations like variation selectors to be dropped.

More formally, Section 6 of UTS 46 uses the NFKC_Casefold value as the "base mapping value" for what the character should turn into. And sure enough, the variation selectors are mapped to the empty string:

FDFC          ; NFKC_CF; 0631 06CC 0627 0644 #Sc   RIAL SIGN
FE00..FE0F    ; NFKC_CF;                # Mn  [16] VARIATION SELECTOR-1..VARIATION SELECTOR-16
FE10          ; NFKC_CF; 002C           # Po       PRESENTATION FORM FOR VERTICAL COMMA