mathiasbynens / emoji-regex

A regular expression to match all Emoji-only symbols as per the Unicode Standard.
https://mths.be/emoji-regex
MIT License
1.73k stars 174 forks source link

A common emoji (U+2764) cannot match #54

Closed peterdaiweb closed 3 years ago

peterdaiweb commented 5 years ago
emoji

this heart is black, but in wechat, when you spell "xin"(tip: heart by English), it's a red heart, its unicode is 2764

abrad45 commented 5 years ago

I don't believe this symbol is an emoji

mindlink commented 5 years ago

According to emoji pedia 2764 is an Emoji and part of Emoji 1.0: https://emojipedia.org/heavy-black-heart/

mislav commented 4 years ago

Let's see what the Unicode spec says about 2764:

2764 FE0F                                  ; fully-qualified     # ❤️ E2.0 red heart
2764                                       ; unqualified         # ❤ E2.0 red heart

We can see that 2764 without “variation selector 16” (FE0F) is an unqualified emoji:

ED-17a. qualified emoji character — An emoji character in a string that (a) has default emoji presentation or (b) is the first character in an emoji modifier sequence or (c) is not a default emoji presentation character, but is the first character in an emoji presentation sequence.

ED-18. fully-qualified emoji — A qualified emoji character, or an emoji sequence in which each emoji character is qualified.

ED-18a. minimally-qualified emoji — An emoji sequence in which the first character is qualified but the sequence is not fully qualified.

ED-19. unqualified emoji — An emoji that is neither fully-qualified nor minimally qualified.

The way I interpret it, 2764 without FE0F should not render as a color emoji by default, and therefore it should not be matched by emoji-regex.

mathiasbynens commented 3 years ago

@mislav is spot-on.

You can use /text.js to match this character, but the other regular expressions should not match it.

Generally, the philosophy behind emoji-regex is to avoid making decisions about which characters/sequences are emoji and which aren't, and instead let the Unicode Standard make those decisions.