Closed cuper6 closed 3 years ago
For the second issue, it seems you can match that string using the text.js
version of this library.
For the first issue, it seems like a true bug. Unless there is some technical reason why 0-9 should be included...
Re: matching digits 0-9: https://github.com/mathiasbynens/emoji-regex/issues/33#issuecomment-373674579
Re: the second issue you mention, can you share reproduction steps? This seems to work correctly:
const emoji = '\u{1F473}\u200D\u2640\uFE0F';
emoji.match(emojiRegex())[0] === emoji;
@nolanlawson What is difference with text
version? When either version should be used? 🤔 It is not documentated
@henrikra From the README:
To match emoji in their textual representation as well (i.e. emoji that are not
Emoji_Presentation
symbols and that aren’t forced to render as emoji by a variation selector),require
the other regex:const emojiRegex = require('emoji-regex/text.js');
So can you give me example?
@henrikra Digits like 0-9 as @cuper6 mentioned
Hmm I really dont get it. Can you give an actual example case A where you should use index version and case B where you should use text version
@henrikra The text
flavor was added after people asked for it in https://github.com/mathiasbynens/emoji-regex/issues/13.
just tested, this lib does not match 🛠emoji
I'm also seeing issues with emoji like 🎛 and 🕸
To find characters that emoji-regex (from current master 6727974) doesn't match, I've downloaded http://unicode.org/Public/emoji/12.1/emoji-test.txt, then filtered it down to exclude "unqualified" and "component" entries:
grep -E '^[^#]' emoji-test.txt | grep -Ev '; (unqualified|component)'
I piped that into this script:
const emojiRegex = require('emoji-regex')()
let total = 0
let unmatched = 0
require('readline').createInterface({
input: process.stdin
}).on('line', line => {
const [_, description] = line.split(/#[^E]*/, 2)
const [sequence] = line.split(/\s*;/, 2)
const emoji = sequence.split(' ').map(c => String.fromCodePoint(parseInt(c, 16))).join('')
total++
if (!emojiRegex.exec(emoji)) {
console.warn('unmatched: %s (%s)', description, sequence)
unmatched++
}
}).on('close', function() {
console.log('%d/%d did not match', unmatched, total)
if (unmatched > 0) process.exit(1)
})
The result is here. Its summary is:
1789/3767 did not match
Since the input were all fully qualified or partially qualified emoji, I had expected all of them to match. That 1789 failed to match is a bit worrisome, or an indicator that my assumptions are incorrect.
An example of a fully qualified emoji that didn't match: 🧐 “face with monocle” (1F9D0
). Am I using emoji-regex wrong?
Sorry, disregard my above comment. I now see that I've been using exec()
wrong; it should have been in a while
loop like in the README:
-if (!emojiRegex.exec(emoji)) {
+let matched = false
+while (match = emojiRegex.exec(emoji)) matched = true
+if (!matched) {
They all match now! 🎉
Closing this issue since the /text.js
question has been answered, and there's nothing actionable left. Feel free to re-open if I missed anything.
Hello! I found two issues with v7.0.1 regular expression. 1) Regexp matches digits like 0,1...9 but not to matches some emoji codes like \u271d (Latin Cross Emoji). It seems the problem is here: (?:[#*0-9\xA9\xAE\u203C\ 2) Regexp does not matches some long emoji constructions like \uD83D\uDC73\u200D\u2640\uFE0F (👳♀️)