Closed merih closed 5 years ago
Thatโs not a standard emoji sequence AFAICT โ U+1F575 U+FE0F is not listed in emoji-zwj-sequences.txt
. The U+FE0F is not necessary.
Apple appears to have a very loose idea of conformance to the standard set of codepoints. While working on other fixes for emoji-regex
I created a list of all the emoji available on my Mac (macOS 10.12.6) via the emoji picker. (Handy hint: Don't do that if you value your time and patience.)
There were 49 Emoji_Presentation
or Emoji_Modifier_Base
characters that have U+FE0F
appended to them by the macOS picker, with no real consistency about which ones do or don't get the variation selector added (e.g. ๐ค doesn't but โ๏ธ does). Plus there were another 100 or so textual representation characters that are displayed by macOS in presentation mode without appending U+FE0F
.
Any real downsides to adding this control character to the regex? Besides bloating the regex just to workaround a possible MacOS bug.
Excerpt from http://unicode.org/Public/emoji/5.0/emoji-test.txt:
1F575 FE0F ; fully-qualified # ๐ต๏ธ detective
1F575 ; non-fully-qualified # ๐ต detective
So the sequence in question is rather conformant.
Thanks for the pointer, @artyom!
Per http://unicode.org/reports/tr51/#Emoji_Implementation_Notes, emoji ZWJ sequences โmay have an emoji presentation selectorโ.
Hacky solution: just add \uFE0F?
to the regex (or [\uFE0E\uFE0F]?
for the text
regex). However, some of the sequences already end with presentation or variation selectors and are therefore already qualified โ those shouldnโt be matched along with the U+FE0F. A proper fix will take some more time.
For my own project's use I ended up going with that same hacky solution. I figured it wasn't right to submit a PR back to this project for it, so I just left it on a custom branch of my fork.
Are there any plans to integrate this into the project? It seems that the consensus is that this is a legitimate use case...sorry if I'm off base here
@fredvollmer https://github.com/mathiasbynens/emoji-regex/issues/28#issuecomment-323044429 answers your question. Iโd welcome a patch :)
@mathiasbynens how to solve this quesiton? I met this question, too
Hi @mathiasbynens, is it possible to add rules for those not fall on the sequence
egs :
๐ฟ ๐ ๐ ๐ท ๐ธ ๐ โ ๐ฃ ๐ถ โ๏ธ โ๏ธ โ๏ธ โ๐ผ โก๏ธ โญ๏ธ ๐ช ๐ค ๐ฅ ๐ฆ ๐ง โ ๐ฉ ๐จ ๐ฌ ๐จ ๐ถ ๐ฝ โธ โท ๐ ๐ต ๐ ๐ ๐ ๐ ๐ฉ ๐ฐ ๐ฅ ๐ณ ๐บ ๐ โฑ ๐ ๐ ๐ โฐ ๐ ๐ ๐ ๐ ๐ ๐ โฉ ๐ค ๐ฃ ๐ ๐ ๐ฅ ๐จ ๐ฑ ๐ฒ ๐น ๐ ๐ฝ ๐ ๐ ๐ ๐ โฑ โฒ ๐ฏ ๐ ๐ข // โ ๐ โ โ ๐ก ๐ก ๐ณ ๐ก ๐ ๐ ๐ ๐ ๐ผ ๐ ๐ท ๐ ๐ ๐ ๐ณ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ โธ โฏ โน โบ โญ โฎ ๐โ๐จ ๐ฏ ๐ฐ โด ๐ซ ๐ โ๏ธ โ
๏ธ โ๏ธ โ๏ธ โฝ๏ธ โพ โณ๏ธ โต๏ธ โฝ๏ธ โ๏ธ โฒ๏ธ โบ๏ธ โช๏ธ โ๏ธ โ๏ธ โ๏ธ โ๏ธ โ๏ธ โ๏ธ โ๏ธ โ๏ธ โ๏ธ โ๏ธ โ๏ธ โ๏ธ โ๏ธ โ๏ธ ๐๏ธ โญ๏ธ โ๏ธ โ๏ธ ๐ฏ๏ธ โฟ๏ธ โช๏ธ โซ๏ธ โฌ๏ธ โฌ๏ธ โพ๏ธ โฝ๏ธ
Try again using the latest release!
const emojiRegex = require('emoji-regex');
const string = '\u{1F575}\uFE0F'; // '๐ต๏ธ'
console.log(
string.match(emojiRegex())
);
// โ [ '๐ต๏ธ' ]
Closing as fixed. Feel free to reopen or file a new bug in case I missed anything.
Male detective emoji, ๐ต๏ธ
"\u{1f575}\ufe0f"
, when matched with emoji regex, not all of its codepoints are consumed, leaving\ufe0f
behind. The emoji is typed with control+cmd+space shortcut of Mac.