emoji-regex/text match string contains number

mathiasbynens / emoji-regex

A regular expression to match all Emoji-only symbols as per the Unicode Standard.

https://mths.be/emoji-regex

MIT License

1.73k stars 174 forks source link

emoji-regex/text match string contains number #14

Closed roderickhsiao closed 7 years ago

roderickhsiao commented 7 years ago

When the string has number inside, emoji-regex/text matches it

version 6.4.0


var emojiRegex = require('emoji-regex/text');
const matchExpected = emojiRegex().exec('foo');
console.log(matchExpected)
// null

const matchUnExpected = emojiRegex().exec('foo123');
console.log(matchUnExpected)
// [ '1', index: 3, input: 'foo123' ]
``

mathiasbynens commented 7 years ago

That’s the expected behavior. E.g. 1 is a text emoji. See http://unicode.org/Public/emoji/5.0/emoji-data.txt:

0030..0039    ; Emoji                # 1.1 [10] (0️..9️)    digit zero..digit nine

Per spec, they’re only supposed to be rendered in emoji form when followed by a variation selector: http://unicode.org/Public/emoji/5.0/emoji-sequences.txt But since you’re using the text regex you opt in to matching them anyway.

roderickhsiao commented 7 years ago

Thanks @mathiasbynens I guess we will need to handle on our side then 👍

cheers

mathiasbynens commented 7 years ago

emoji-regex matches emoji according to the Unicode Standard. It sounds like you want to do something else — how do you determine what’s an emoji and what isn’t?

roderickhsiao commented 7 years ago

Yes, we are parsing a string and try to extract the emoji, currently after parsing we are getting the number which probably shouldnt present as Emoji in our case

roderickhsiao commented 7 years ago

we basically just split the sentence and check individual emojiRegex().test(c) to get emoji in sentence

mathiasbynens commented 7 years ago

You didn’t answer the question — for your use case, how do you decide what constitutes an emoji and what isn’t?

roderickhsiao commented 7 years ago

We add a flag for parsed emoji which match the unicode spec and check if browser render an emoji (icon) for that purely.

roderickhsiao commented 7 years ago

Checked the spec, probably we want to exclude

0023 ; Emoji # 1.1 [1] (#️) number sign 002A ; Emoji # 1.1 [1] (*️) asterisk 0030..0039 ; Emoji # 1.1 [10] (0️..9️) digit zero..digit nine

But you are absolutely correct, those are consider valid emoji.