mathiasbynens / emoji-regex

A regular expression to match all Emoji-only symbols as per the Unicode Standard.
https://mths.be/emoji-regex
MIT License
1.73k stars 174 forks source link

does not match trailing VARIATION SELECTOR-16 #45

Closed rodneyrehm closed 3 years ago

rodneyrehm commented 5 years ago

Following up on this tweet:

I'm using emoji-regex to identify if the last symbol of a string is an emoji. While this works for most emojis, it does not for ⚽️. As it turns out this is happening because macOS inserts U+26BD followed by U+FE0F and that trailing variation selector is not part of the emoji-regex match.

While I don't think this is a bug in emoji-regex I do believe emoji-regex could help avoid this situation by including the unnecessary variation selector in the match.

jcubic commented 5 years ago

I think it's a bug because unicode without variation selector is just text and if there is variation selector after it should be rendered as image and only then it should be match as emoji.

I have another issue, What about pizza symbol:

🍕️ "\ud83c\udf55\ufe0f" https://emojipedia.org/slice-of-pizza/

U+1F355 U+FE0F

is this emoji with variation selector a valid emoji sequence? Becasue it matches only first two codepoints of surrogate pair and ignore variation selector.

mathiasbynens commented 5 years ago

@jcubic Let's look up U+1F355 in https://unicode.org/Public/emoji/latest/emoji-data.txt. It contains:

1F337..1F37C  ; Emoji                #  6.0 [70] (🌷..🍼)    tulip..baby bottle
...
1F337..1F37C  ; Emoji_Presentation   #  6.0 [70] (🌷..🍼)    tulip..baby bottle

It has Emoji_Presentation=True, so it doesn't need the U+FE0F to be displayed as an emoji (per the spec).

jcubic commented 5 years ago

Oh, thanks, I was not sure about this one.

mathiasbynens commented 3 years ago

macOS and iOS emoji input has recently improved to be more in line with the spec. I'm hoping Apple has solved (or will continue to solve) this problem so that we don't need to work around it in emoji-regex.

For your use case, you could check if the string ends with a variation selector, and remove it before further processing the string.