mathiasbynens / emoji-regex

A regular expression to match all Emoji-only symbols as per the Unicode Standard.
https://mths.be/emoji-regex
MIT License
1.73k stars 174 forks source link

Some profession emojis don't match #27

Closed duygualyzc closed 7 years ago

duygualyzc commented 7 years ago

Hi Mathias,

Profession emojis which include gender type don't match, such as: https://emojipedia.org/female-construction-worker/ https://emojipedia.org/female-health-worker/

Is it about the generated regex or emoji version or anything else?

Thanks!

mathiasbynens commented 7 years ago

Can you post a code sample to reproduce the issue please? It seems to work for me:

const string = '\u{1F477}\u200D\u2640\uFE0F'; // '👷‍♀️'
console.log(string.match(emojiRegex())[0] === string);
// true
duygualyzc commented 7 years ago

I want to replace the matched emoji by wrapping it with an element. So my case is like this:

const text = '👷‍♀️'; // '\u{1F477}\u200D\u2640\uFE0F'
const replacedText = text.replace(emojiRegex(), (m) => {
   return '<span>' + m + '</span>';
});
console.log(replacedText);
// '<span>👷</span>♀'

it separates the zwj sequences and matches just with 👷

mathiasbynens commented 7 years ago

Which version of emoji-regex are you using? Your example works as expected in the latest version.

duygualyzc commented 7 years ago

Using v6.5.1. But I guess the problem is our emoji list which we got from gemoji When I converted the emojis to unicode, I realised that they don't include the gender part. I updated them with the correct ones, and the problem is solved.

Sorry for the hassle and thanks for your effort!

merih commented 7 years ago

Actually the thing is, emoji.json in gemoji doesn't include \ufe0f character at the end of such emojis. I don't know if that's an error in the data, but browsers and OS seem to render the emoji just fine without that character, however emoji regex separates the base from gender sequence.

Is this in the unicode spec, that emojis should be rendered correctly without fe0f, or browsers/OS trying their best?

mathiasbynens commented 7 years ago

emoji regex separates the base from gender sequence

Please provide a code example so I can reproduce this problem. The example in https://github.com/mathiasbynens/emoji-regex/issues/27#issuecomment-320355920 seems to work fine.

merih commented 7 years ago
const text = "\u{1F477}\u200D\u2640";
console.log(text);
//> 👷‍♀️

const replacedText = text.replace(emojiRegex(), m => `<span>${m}</span>`);
console.log(replacedText);
//> <span>👷</span>♀
mathiasbynens commented 7 years ago

That’s the same example as before? It works as expected for me in the latest version: https://github.com/mathiasbynens/emoji-regex/issues/27#issuecomment-320531995

merih commented 7 years ago

Not really, the text is missing the last code point \ufe0f, does it still work as expected?

mathiasbynens commented 7 years ago

Your input text doesn’t include \uFE0F, so why would you expect it in the output?

merih commented 7 years ago

Sorry if I wasn't clear enough. I wasn't expecting it to be in the output. The thing is, "\u{1F477}\u200D\u2640" when logged directly to the console, can produce the correct gendered emoji, however when passed through the regex, it doesn't take "\u2640" into account and only matches the male part.

const text = "\u{1F477}\u200D\u2640";
console.log(text);
//> 👷‍♀️ 
// female construction worker, it's displayed properly without \uFE0F character

const replacedText = text.replace(emojiRegex(), m => `<span>${m}</span>`);
console.log(replacedText);
//> <span>👷</span>♀
// only the male construction worker is matched with emoji regex,
// and ♀ (\u2640) character is left out

// which is basically "<span>\u{1F477}\u200D</span>\u2640"
// but I would expect it to be "<span>\u{1F477}\u200D\u2640</span>"
highfeed commented 6 years ago

Yes! I too have problem with a \uFE0F #29