mathiasbynens / emoji-regex

A regular expression to match all Emoji-only symbols as per the Unicode Standard.
https://mths.be/emoji-regex
MIT License
1.74k stars 174 forks source link

Missing Unicode 9.0 "Rolling on the floor laughing" and more #10

Closed jhuckaby closed 7 years ago

jhuckaby commented 7 years ago

This otherwise wonderful and highly useful regular expression seems to be missing a number of modern / new Emoji included in Unicode 9 / Emoji 3.0. The missing emojis include the popular Rolling on the floor laughing (:rofl:) and Nauseated Face (:nauseated_face:), as well as others.

screenshot 2017-01-15 21 19 30

Any chance we can get these new ones added to the regex?

Thanks for an awesome library!

- Joe

jhuckaby commented 7 years ago

Here's the link to the official list of all the new Emoji added to Unicode 9.0, from the unicode.org website:

http://www.unicode.org/emoji/charts/emoji-versions.html#v9.0_2016

🀣 🀀 🀠 🀑 πŸ€₯ 🀒 🀧 🀢 🀴 🀡 🀰 🀦 πŸ€¦β€β™‚οΈ πŸ€¦β€β™€οΈ 🀷 πŸ€·β€β™‚οΈ πŸ€·β€β™€οΈ πŸ•Ί 🀺 🀸 πŸ€Έβ€β™‚οΈ πŸ€Έβ€β™€οΈ 🀼 πŸ€Όβ€β™‚οΈ πŸ€Όβ€β™€οΈ 🀽 πŸ€½β€β™‚οΈ πŸ€½β€β™€οΈ 🀾 πŸ€Ύβ€β™‚οΈ πŸ€Ύβ€β™€οΈ 🀹 πŸ€Ήβ€β™‚οΈ πŸ€Ήβ€β™€οΈ 🀳 🀞 πŸ€™ πŸ€› 🀜 🀚 🀝 πŸ–€ 🦍 🦊 🦌 🦏 πŸ¦‡ πŸ¦… πŸ¦† πŸ¦‰ 🦎 🦈 🦐 πŸ¦‘ πŸ¦‹ πŸ₯€ πŸ₯ πŸ₯‘ πŸ₯” πŸ₯• πŸ₯’ πŸ₯œ πŸ₯ πŸ₯– πŸ₯ž πŸ₯“ πŸ₯™ πŸ₯š πŸ₯˜ πŸ₯— πŸ₯› πŸ₯‚ πŸ₯ƒ πŸ₯„ πŸ›΄ πŸ›΅ πŸ›‘ πŸ›Ά πŸ₯‡ πŸ₯ˆ πŸ₯‰ πŸ₯Š πŸ₯‹ πŸ₯… πŸ₯ πŸ›’

I believe these are the exact items missing from the regex.

msklvsk commented 7 years ago

…and also ❄️

kesha-antonov commented 7 years ago

🎈❀️

Missing these. Is there somewhere full list of emojis so I can update regex?

nizaroni commented 7 years ago

I suspect the problem really lies with the babel-plugin-transform-unicode-property-regex project, which is the Babel plugin that translates the Unicode property escapes in the regex to the actual code points of the emoji.

I took a look, but I don't really understand enough about how that works to incorporate the new emoji.

patrickkettner commented 7 years ago

The regex is genearted from the unicode escapes, via babel-plugin-transform-unicode-property-regex, which is powered by regexpu-core, which gets it's info from regenerate-unicode-properties. The various emoji files need to be updated to include the updates ranges for unicode 9 emoji

patrickkettner commented 7 years ago

PRed the upstream data, if and when that merges, just need to run the build script here.

mxstbr commented 7 years ago

Amazing, can't wait! Thanks @patrickkettner πŸ™

patrickkettner commented 7 years ago

@mathiasbynens I assume that src/index.js needs to be updated since the change in Unicode-tr51 created the new Emoji_Component file, but I am not sure where in that I would inject the values. Happy to create a PR if you could enlighten

nizaroni commented 7 years ago

Should index.js also have been updated on commit af72086?

patrickkettner commented 7 years ago

@khalifenizar yeah, #12 for a fix. you can require('emoji-regex/dist') in the mean time

mxstbr commented 7 years ago

Amazing work, thanks so much for the quick update!