mathiasbynens / emoji-regex

A regular expression to match all Emoji-only symbols as per the Unicode Standard.
https://mths.be/emoji-regex
MIT License
1.72k stars 175 forks source link

Doesn't match ☤ (Caduceus, U+2624) #67

Closed josephrocca closed 3 years ago

josephrocca commented 4 years ago

Just tested version 8 and it doesn't match ☤ (Caduceus, U+2624)

Demo: https://jsbin.com/hazotolexu/edit?html,output

(async function() {
  window.module = {}
  eval( await fetch("https://unpkg.com/emoji-regex/index.js").then(r => r.text()) );
  emojiRegex = window.module.exports();
  alert( emojiRegex.test("☤") );
})();

It has been around since 1993, but note that emojipedia says:

This Unicode character has no emoji version, meaning this is intended to display only as a black and white glyph on most platforms. It has not been Recommended For General Interchange (RGI) — as an emoji — by Unicode.

So perhaps you don't want to include it in this lib. Thought I'd mention it just in case. Thanks for the great lib in any case! Really saved the day for me :)

nolanlawson commented 4 years ago

It seems that 🕫 (bullhorn) also fits this description. Not matched by emoji-regex, and emojipedia has the same blurb about "no emoji version."

josephrocca commented 3 years ago

In case it's helpful to others, I ended up creating making a module that fixes this and also incorporates @gilmoreorless's variation selector fix: https://github.com/josephrocca/emoji-and-symbol-regex

ChurchTao commented 3 years ago

@josephrocca thank you for your https://github.com/josephrocca/emoji-and-symbol-regex, but regex not very readable, try my edition base on https://www.unicode.org/Public/emoji/13.0/emoji-test.txt https://github.com/ChurchTao/emoji-js

mathiasbynens commented 3 years ago

Thanks for reporting this, @josephrocca! As you mention, ☤ is a text character per the Unicode Standard, and it renders as such on most platforms, so I've decided not to include it in this library for now. Generally, I want to avoid making decisions about which characters/sequences are emoji and which aren't, and instead let the Unicode Standard make those decisions.