mathiasbynens / emoji-regex

A regular expression to match all Emoji-only symbols as per the Unicode Standard.
https://mths.be/emoji-regex
MIT License
1.73k stars 174 forks source link

missing code points for some emojies #52

Closed avner-hoffmann closed 3 years ago

avner-hoffmann commented 5 years ago

using the library I'm trying to find out if a text has only emojies this is the way I try to do it:

import emojiRegex from 'emoji-regex/es2015/text'; // can be the regular version as well (not the text one)

later, I'm going through all the matches and accumulating each match's length (to sum up the code points) at the end I'm comparing the original text's length and the accumulated length as follows:

let totalEmojiesLength = 0;
let match;
while ((match = regex.exec(this.data.body))) {
        const emoji = match[0];
        totalEmojiesLength += [...emoji].length;
}

if (this.data.body.length === totalEmojiesLength) {
        return true;
}

return false;

However, for some emojies, e.g. πŸŽƒ, ⛄️ and some more, only the first code point is returned, so the length of the emojies is wrong.

it looks like a bug for those emojies that their match doesn't return all their code points

and follow-up question - is there any other way to test if a text has only emojies using the library? I didn't find one and my solution isn't really effective performance wise...

related to #35

mathiasbynens commented 3 years ago

However, for some emojies, e.g. πŸŽƒ, ⛄️ and some more, only the first code point is returned, so the length of the emojies is wrong.

πŸŽƒ is '\u{1F383}' which is just a single code point already, so matching one code point is expected in this case.

⛄️ is '\u26C4\uFE0F'.

U+26C4 is a fully qualified Emoji_Presentation code point already per the Unicode Standard, and so it doesn’t need the U+FE0F. This is similar to appending U+FE0F to any non-emoji character: it has no effect (or at least, it should have no effect according to the spec). Matching only U+26C4 is expected here as well.

and follow-up question - is there any other way to test if a text has only emojies using the library? I didn't find one and my solution isn't really effective performance wise...

See #64.