iamcal / emoji-data

Easy to parse data and spritesheets for emoji
MIT License
2.55k stars 301 forks source link

Unified code Outdated #237

Closed jackchevalley closed 7 months ago

jackchevalley commented 7 months ago

I have found some emojis where the unified code is outdated to IOS5 and has never been updated to IOS7 encoding which is the currently used. For example https://www.iemoji.com/view/emoji/384/activity/flag-in-hole IOS 5: "\u26F3" (26F3) IOS 7: "\u26F3\uFE0F" (26F3-FE0F)

This generates lots of issues working with databases and matching emojis.

iamcal commented 7 months ago

This is because of "variations selectors", which prompt the font rendering system whether to explicitly render something as an emoji (U+FE0F) or text (U+FE0E). There are other selectors which aren't relevant here.

When Emoji were first introduced, many of them used the same codepoint as existing non-emoji characters (keycaps for example). In those cases, a variant selector is needed to disambiguate. These selectors are part of the Unicode spec for those emoji.

In the case of flag-in-hole (U+26F3) there was never an original text version, so the Unicode spec does not specify a variation selector. Vendors (like Apple) are free to include selectors whenever they want, with any emoji they want to, but that data (which varies from release to release) is not necessary for interpreting or displaying emoji. Any emoji can be suffixed with a variation selector, which in this case does nothing and is not required. If you're matching OS-generated strings, you will need to account for and filter out any extra selectors that get added.