Closed thepwnman33 closed 2 years ago
This is not something we can change unilaterally, as the Unicode standard specifies that variation selectors should be ignored in IDNA processing:
FE00..FE0F ; ignored # 3.2 VARIATION SELECTOR-1..VARIATION SELECTOR-16
Furthermore, the example you provided uses U+200D Zero Width Joiner, which is only allowed in Unicode labels under very specific circumstances.
I have personally never heard of Opensea, Etherscan, or Metawallet, but it sounds like they should all be made follow the Unicode IDNA standard that literally everyone else uses. There are several other implementations of IDNA, including Node.js, web browsers, ICU4C, Python's idna, and Go's x/net/idna. All of them (assuming the right options are set: utf46=True
for Python's idna, or idna.Lookup
for Go's x/net/idna) behave correctly. As an example, running
new URL('http://\u{1F9DE}\u2642\uFE0F.eth').href === new URL('http://\u{1F9DE}\u2642.eth').href
returns true in Chrome, Firefox, and Safari.
To gain some intuition why variation selectors are forbidden, think of them as follows. We are dealing with domain names (well technically domain labels), which are modeled as a kind of "plain text". People are generally used to AbC.com
being transformed to abc.com
, but more exotically, 1ªleydenewton.mx
maps to 1aleydenewton.mx
, and ﷼.ir
(containing the symbol for the currency rial) is mapped to ریال.ir
(where the symbol is decomposed into individual letters).
By the same token, it makes sense for "optional" (for text at least) annotations like variation selectors to be dropped.
More formally, Section 6 of UTS 46 uses the NFKC_Casefold value as the "base mapping value" for what the character should turn into. And sure enough, the variation selectors are mapped to the empty string:
FDFC ; NFKC_CF; 0631 06CC 0627 0644 #Sc RIAL SIGN
FE00..FE0F ; NFKC_CF; # Mn [16] VARIATION SELECTOR-1..VARIATION SELECTOR-16
FE10 ; NFKC_CF; 002C # Po PRESENTATION FORM FOR VERTICAL COMMA
I believe there is a discrepancy that is not allowing for standardizing normalisation between several large platforms and markets in the space
It appears someone is not applying the stand in compliance or doing things slightly different most specifically the library strips FE0F code away when resolving displaying from a few native emojis with joiners and FE0F
https://unicode.org/Public/emoji/14.0/emoji-test.txt According to this, FE0F should be preserved to be "fully qualified", which is theoretically the more adaptive version, this is currently causing major issues with decentralized emoji domains (.eth) in Opensea , Etherscan and Metawallet, everyone is using a different library and there is no standardization) more specifically Metawallet (which uses this library) resolves differently than Opensea and Etherscan
Taking the example of emoji (genie male) taken directly from unicode.org
1F9DE 200D 2642 FE0F ; fully-qualified # 🧞♂️ E5.0 man genie
If you were to take the man genie 🧞♂️ ( Hand held devices/Discord/Twitter etc etc will default to 1F9DE 200D 2642 FE0F) large markets like Opensea and Etherscan too. However after this script runs, the strip takes place, we are left with 1F9DE 200D 2642 It strips the FE0F from the man genie turning it into a minimally-qualified emoji. This is disrupting all of the emoji .eth domains, their connection with Metawallet addresses.
It would appear the wisest solution would hopefully try to apply the stand and more widely supported emoji version which in this case is the fully-qualified 1F9DE 200D 2642 FE0F the variation selector 16 modifier shouldn't be stripped away, as its responsible for current and likely future generations of aesthetically different emojis while being the arguably better choice for Unicode stand
Regards