mathiasbynens / emoji-regex

A regular expression to match all Emoji-only symbols as per the Unicode Standard.
https://mths.be/emoji-regex
MIT License
1.74k stars 174 forks source link

Failing to match emoji keycap numbers 0-9, # and * #3

Closed rodrigopolo closed 8 years ago

rodrigopolo commented 8 years ago

I tested, all emojis available on iOS 9.1 / OS X 10.11.1 and it is failing to detect emoji numbers:

\u0030\ufe0f\u20e3
\u0031\ufe0f\u20e3
\u0032\ufe0f\u20e3
\u0033\ufe0f\u20e3
\u0034\ufe0f\u20e3
\u0035\ufe0f\u20e3
\u0036\ufe0f\u20e3
\u0037\ufe0f\u20e3
\u0038\ufe0f\u20e3
\u0039\ufe0f\u20e3
\u0023\ufe0f\u20e3
\u002a\ufe0f\u20e3

As typed

0️⃣
1️⃣
2️⃣
3️⃣
4️⃣
5️⃣
6️⃣
7️⃣
8️⃣
9️⃣
#️⃣
*️⃣
beaugunderson commented 8 years ago

:+1: I noticed this as well, there are also ~70 emoji that can have a \uFE0F at the end, it's the emoji variation selector.

beaugunderson commented 8 years ago

Or maybe it's at the beginning... Hard to tell when there are adjacent emoji.

beaugunderson commented 8 years ago

Sorry, definitely supposed to be at the end: http://unicode.org/Public/UCD/latest/ucd/StandardizedVariants.html

rodrigopolo commented 8 years ago

One thing is "supposed" and other is "in practice", Apple, Google and Microsoft implement Emojis very differently, that's why after a long process I decided to create a regex using all emojis typed from the vendor devices (removing any duplicate) then sorting them by length, the big one first for replacement, then if a match is found I strip all the extra emoji characters that don't do something relevant, like the variant selector and the other that glues together emojis, in this way I can find any emoji and find their unicode representation easily.

beaugunderson commented 8 years ago

Sorry, I was only replying to my previous comment about \uFE0F appearing at the beginning or end of ~70 other emoji (in the non-keycap emoji it's supposed to be at the end), you are correct that it's supposed to be (and appears in practice) in the middle of the keycap emoji. :)

mathiasbynens commented 8 years ago

https://github.com/mathiasbynens/emoji-regex/issues/2#issuecomment-118503456

pettedemon commented 8 years ago

Hi, I have a php function to remove the emoji, but how can I add these particular emoji in my function?

function remove_emoji($text){
  return preg_replace('/([0-9|#][\x{20E3}])|[\x{00ae}|\x{00a9}|\x{203C}|\x{2047}|\x{2048}|\x{2049}|\x{3030}|\x{303D}|\x{2139}|\x{2122}|\x{3297}|\x{3299}][\x{FE00}-\x{FEFF}]?|[\x{2190}-\x{21FF}][\x{FE00}-\x{FEFF}]?|[\x{2300}-\x{23FF}][\x{FE00}-\x{FEFF}]?|[\x{2460}-\x{24FF}][\x{FE00}-\x{FEFF}]?|[\x{25A0}-\x{25FF}][\x{FE00}-\x{FEFF}]?|[\x{2600}-\x{27BF}][\x{FE00}-\x{FEFF}]?|[\x{2900}-\x{297F}][\x{FE00}-\x{FEFF}]?|[\x{2B00}-\x{2BF0}][\x{FE00}-\x{FEFF}]?|[\x{1F000}-\x{1F6FF}][\x{FE00}-\x{FEFF}]?/u', '', $text);
}
mathiasbynens commented 8 years ago

@pettedemon Try Stack Overflow.