mathiasbynens / emoji-regex

A regular expression to match all Emoji-only symbols as per the Unicode Standard.
https://mths.be/emoji-regex
MIT License
1.74k stars 174 forks source link

Some Emoji no longer match after 6.1.3 #13

Closed roderickhsiao closed 7 years ago

roderickhsiao commented 7 years ago

Thanks for providing the library, we notice that some emoji no longer match the regex after the latest version publish.

Not sure if it because the unicode spec changes? http://www.unicode.org/reports/tr51/

Test case

const emojiRegex = require('emoji-regex');

const emojis = '๐Ÿ˜,๐Ÿ˜‚,๐Ÿ˜ƒ,๐Ÿ˜„,๐Ÿ˜…,๐Ÿ˜†,๐Ÿ˜‰,๐Ÿ˜Š,๐Ÿ˜‹,๐Ÿ˜Œ,๐Ÿ˜,๐Ÿ˜,๐Ÿ˜’,๐Ÿ˜“,๐Ÿ˜”,๐Ÿ˜–,๐Ÿ˜˜,๐Ÿ˜š,๐Ÿ˜œ,๐Ÿ˜,๐Ÿ˜ž,๐Ÿ˜ ,๐Ÿ˜ก,๐Ÿ˜ข,๐Ÿ˜ฃ,๐Ÿ˜ค,๐Ÿ˜ฅ,๐Ÿ˜จ,๐Ÿ˜ฉ,๐Ÿ˜ช,๐Ÿ˜ซ,๐Ÿ˜ญ,๐Ÿ˜ฐ,๐Ÿ˜ฑ,๐Ÿ˜ฒ,๐Ÿ˜ณ,๐Ÿ˜ต,๐Ÿ˜ท,๐Ÿ˜ธ,๐Ÿ˜น,๐Ÿ˜บ,๐Ÿ˜ป,๐Ÿ˜ผ,๐Ÿ˜ฝ,๐Ÿ˜พ,๐Ÿ˜ฟ,๐Ÿ™€,๐Ÿ™…,๐Ÿ™†,๐Ÿ™‡,๐Ÿ™ˆ,๐Ÿ™‰,๐Ÿ™Š,๐Ÿ™‹,๐Ÿ™Œ,๐Ÿ™,๐Ÿ™Ž,๐Ÿ™,โœ‚,โœ…,โœˆ,โœ‰,โœŠ,โœ‹,โœŒ,โœ,โœ’,โœ”,โœ–,โœจ,โœณ,โœด,โ„,โ‡,โŒ,โŽ,โ“,โ”,โ•,โ—,โค,โž•,โž–,โž—,โžก,โžฐ,๐Ÿš€,๐Ÿšƒ,๐Ÿš„,๐Ÿš…,๐Ÿš‡,๐Ÿš‰,๐ŸšŒ,๐Ÿš,๐Ÿš‘,๐Ÿš’,๐Ÿš“,๐Ÿš•,๐Ÿš—,๐Ÿš™,๐Ÿšš,๐Ÿšข,๐Ÿšค,๐Ÿšฅ,๐Ÿšง,๐Ÿšจ,๐Ÿšฉ,๐Ÿšช,๐Ÿšซ,๐Ÿšฌ,๐Ÿšญ,๐Ÿšฒ,๐Ÿšถ,๐Ÿšน,๐Ÿšบ,๐Ÿšป,๐Ÿšผ,๐Ÿšฝ,๐Ÿšพ,๐Ÿ›€,โ“‚,๐Ÿ…ฐ,๐Ÿ…ฑ,๐Ÿ…พ,๐Ÿ…ฟ,๐Ÿ†Ž,๐Ÿ†‘,๐Ÿ†’,๐Ÿ†“,๐Ÿ†”,๐Ÿ†•,๐Ÿ†–,๐Ÿ†—,๐Ÿ†˜,๐Ÿ†™,๐Ÿ†š,๐Ÿ‡ฉ๐Ÿ‡ช,๐Ÿ‡ฌ๐Ÿ‡ง,๐Ÿ‡จ๐Ÿ‡ณ,๐Ÿ‡ฏ๐Ÿ‡ต,๐Ÿ‡ฐ๐Ÿ‡ท,๐Ÿ‡ซ๐Ÿ‡ท,๐Ÿ‡ช๐Ÿ‡ธ,๐Ÿ‡ฎ๐Ÿ‡น,๐Ÿ‡บ๐Ÿ‡ธ,๐Ÿ‡ท๐Ÿ‡บ,๐Ÿˆ,๐Ÿˆ‚,๐Ÿˆš,๐Ÿˆฏ,๐Ÿˆฒ,๐Ÿˆณ,๐Ÿˆด,๐Ÿˆต,๐Ÿˆถ,๐Ÿˆท,๐Ÿˆธ,๐Ÿˆน,๐Ÿˆบ,๐Ÿ‰,๐Ÿ‰‘,ยฉ,ยฎ,โ€ผ,โ‰,8โƒฃ,9โƒฃ,7โƒฃ,6โƒฃ,1โƒฃ,0โƒฃ,2โƒฃ,3โƒฃ,5โƒฃ,4โƒฃ,#โƒฃ,โ„ข,โ„น,โ†”,โ†•,โ†–,โ†—,โ†˜,โ†™,โ†ฉ,โ†ช,โŒš,โŒ›,โฉ,โช,โซ,โฌ,โฐ,โณ,โ–ช,โ–ซ,โ–ถ,โ—€,โ—ป,โ—ผ,โ—ฝ,โ—พ,โ˜€,โ˜,โ˜Ž,โ˜‘,โ˜”,โ˜•,โ˜,โ˜บ,โ™ˆ,โ™‰,โ™Š,โ™‹,โ™Œ,โ™,โ™Ž,โ™,โ™,โ™‘,โ™’,โ™“,โ™ ,โ™ฃ,โ™ฅ,โ™ฆ,โ™จ,โ™ป,โ™ฟ,โš“,โš ,โšก,โšช,โšซ,โšฝ,โšพ,โ›„,โ›…,โ›Ž,โ›”,โ›ช,โ›ฒ,โ›ณ,โ›ต,โ›บ,โ›ฝ,โคด,โคต,โฌ…,โฌ†,โฌ‡,โฌ›,โฌœ,โญ,โญ•,ใ€ฐ,ใ€ฝ,ใŠ—,ใŠ™,๐Ÿ€„,๐Ÿƒ,๐ŸŒ€,๐ŸŒ,๐ŸŒ‚,๐ŸŒƒ,๐ŸŒ„,๐ŸŒ…,๐ŸŒ†,๐ŸŒ‡,๐ŸŒˆ,๐ŸŒ‰,๐ŸŒŠ,๐ŸŒ‹,๐ŸŒŒ,๐ŸŒ,๐ŸŒ‘,๐ŸŒ“,๐ŸŒ”,๐ŸŒ•,๐ŸŒ™,๐ŸŒ›,๐ŸŒŸ,๐ŸŒ ,๐ŸŒฐ,๐ŸŒฑ,๐ŸŒด,๐ŸŒต,๐ŸŒท,๐ŸŒธ,๐ŸŒน,๐ŸŒบ,๐ŸŒป,๐ŸŒผ,๐ŸŒฝ,๐ŸŒพ,๐ŸŒฟ,๐Ÿ€,๐Ÿ,๐Ÿ‚,๐Ÿƒ,๐Ÿ„,๐Ÿ…,๐Ÿ†,๐Ÿ‡,๐Ÿˆ,๐Ÿ‰,๐ŸŠ,๐ŸŒ,๐Ÿ,๐ŸŽ,๐Ÿ,๐Ÿ‘,๐Ÿ’,๐Ÿ“,๐Ÿ”,๐Ÿ•,๐Ÿ–,๐Ÿ—,๐Ÿ˜,๐Ÿ™,๐Ÿš,๐Ÿ›,๐Ÿœ,๐Ÿ,๐Ÿž,๐ŸŸ,๐Ÿ ,๐Ÿก,๐Ÿข,๐Ÿฃ,๐Ÿค,๐Ÿฅ,๐Ÿฆ,๐Ÿง,๐Ÿจ,๐Ÿฉ,๐Ÿช,๐Ÿซ,๐Ÿฌ,๐Ÿญ,๐Ÿฎ,๐Ÿฏ,๐Ÿฐ,๐Ÿฑ,๐Ÿฒ,๐Ÿณ,๐Ÿด,๐Ÿต,๐Ÿถ,๐Ÿท,๐Ÿธ,๐Ÿน,๐Ÿบ,๐Ÿป,๐ŸŽ€,๐ŸŽ,๐ŸŽ‚,๐ŸŽƒ,๐ŸŽ„,๐ŸŽ…,๐ŸŽ†,๐ŸŽ‡,๐ŸŽˆ,๐ŸŽ‰,๐ŸŽŠ,๐ŸŽ‹,๐ŸŽŒ,๐ŸŽ,๐ŸŽŽ,๐ŸŽ,๐ŸŽ,๐ŸŽ‘,๐ŸŽ’,๐ŸŽ“,๐ŸŽ ,๐ŸŽก,๐ŸŽข,๐ŸŽฃ,๐ŸŽค,๐ŸŽฅ,๐ŸŽฆ,๐ŸŽง,๐ŸŽจ,๐ŸŽฉ,๐ŸŽช,๐ŸŽซ,๐ŸŽฌ,๐ŸŽญ,๐ŸŽฎ,๐ŸŽฏ,๐ŸŽฐ,๐ŸŽฑ,๐ŸŽฒ,๐ŸŽณ,๐ŸŽด,๐ŸŽต,๐ŸŽถ,๐ŸŽท,๐ŸŽธ,๐ŸŽน,๐ŸŽบ,๐ŸŽป,๐ŸŽผ,๐ŸŽฝ,๐ŸŽพ,๐ŸŽฟ,๐Ÿ€,๐Ÿ,๐Ÿ‚,๐Ÿƒ,๐Ÿ„,๐Ÿ†,๐Ÿˆ,๐ŸŠ,๐Ÿ ,๐Ÿก,๐Ÿข,๐Ÿฃ,๐Ÿฅ,๐Ÿฆ,๐Ÿง,๐Ÿจ,๐Ÿฉ,๐Ÿช,๐Ÿซ,๐Ÿฌ,๐Ÿญ,๐Ÿฎ,๐Ÿฏ,๐Ÿฐ,๐ŸŒ,๐Ÿ,๐ŸŽ,๐Ÿ‘,๐Ÿ’,๐Ÿ”,๐Ÿ—,๐Ÿ˜,๐Ÿ™,๐Ÿš,๐Ÿ›,๐Ÿœ,๐Ÿ,๐Ÿž,๐ŸŸ,๐Ÿ ,๐Ÿก,๐Ÿข,๐Ÿฃ,๐Ÿค,๐Ÿฅ,๐Ÿฆ,๐Ÿง,๐Ÿจ,๐Ÿฉ,๐Ÿซ,๐Ÿฌ,๐Ÿญ,๐Ÿฎ,๐Ÿฏ,๐Ÿฐ,๐Ÿฑ,๐Ÿฒ,๐Ÿณ,๐Ÿด,๐Ÿต,๐Ÿถ,๐Ÿท,๐Ÿธ,๐Ÿน,๐Ÿบ,๐Ÿป,๐Ÿผ,๐Ÿฝ,๐Ÿพ,๐Ÿ‘€,๐Ÿ‘‚,๐Ÿ‘ƒ,๐Ÿ‘„,๐Ÿ‘…,๐Ÿ‘†,๐Ÿ‘‡,๐Ÿ‘ˆ,๐Ÿ‘‰,๐Ÿ‘Š,๐Ÿ‘‹,๐Ÿ‘Œ,๐Ÿ‘,๐Ÿ‘Ž,๐Ÿ‘,๐Ÿ‘,๐Ÿ‘‘,๐Ÿ‘’,๐Ÿ‘“,๐Ÿ‘”,๐Ÿ‘•,๐Ÿ‘–,๐Ÿ‘—,๐Ÿ‘˜,๐Ÿ‘™,๐Ÿ‘š,๐Ÿ‘›,๐Ÿ‘œ,๐Ÿ‘,๐Ÿ‘ž,๐Ÿ‘Ÿ,๐Ÿ‘ ,๐Ÿ‘ก,๐Ÿ‘ข,๐Ÿ‘ฃ,๐Ÿ‘ค,๐Ÿ‘ฆ,๐Ÿ‘ง,๐Ÿ‘จ,๐Ÿ‘ฉ,๐Ÿ‘ช,๐Ÿ‘ซ,๐Ÿ‘ฎ,๐Ÿ‘ฏ,๐Ÿ‘ฐ,๐Ÿ‘ฑ,๐Ÿ‘ฒ,๐Ÿ‘ณ,๐Ÿ‘ด,๐Ÿ‘ต,๐Ÿ‘ถ,๐Ÿ‘ท,๐Ÿ‘ธ,๐Ÿ‘น,๐Ÿ‘บ,๐Ÿ‘ป,๐Ÿ‘ผ,๐Ÿ‘ฝ,๐Ÿ‘พ,๐Ÿ‘ฟ,๐Ÿ’€,๐Ÿ’,๐Ÿ’‚,๐Ÿ’ƒ,๐Ÿ’„,๐Ÿ’…,๐Ÿ’†,๐Ÿ’‡,๐Ÿ’ˆ,๐Ÿ’‰,๐Ÿ’Š,๐Ÿ’‹,๐Ÿ’Œ,๐Ÿ’,๐Ÿ’Ž,๐Ÿ’,๐Ÿ’,๐Ÿ’‘,๐Ÿ’’,๐Ÿ’“,๐Ÿ’”,๐Ÿ’•,๐Ÿ’–,๐Ÿ’—,๐Ÿ’˜,๐Ÿ’™,๐Ÿ’š,๐Ÿ’›,๐Ÿ’œ,๐Ÿ’,๐Ÿ’ž,๐Ÿ’Ÿ,๐Ÿ’ ,๐Ÿ’ก,๐Ÿ’ข,๐Ÿ’ฃ,๐Ÿ’ค,๐Ÿ’ฅ,๐Ÿ’ฆ,๐Ÿ’ง,๐Ÿ’จ,๐Ÿ’ฉ,๐Ÿ’ช,๐Ÿ’ซ,๐Ÿ’ฌ,๐Ÿ’ฎ,๐Ÿ’ฏ,๐Ÿ’ฐ,๐Ÿ’ฑ,๐Ÿ’ฒ,๐Ÿ’ณ,๐Ÿ’ด,๐Ÿ’ต,๐Ÿ’ธ,๐Ÿ’น,๐Ÿ’บ,๐Ÿ’ป,๐Ÿ’ผ,๐Ÿ’ฝ,๐Ÿ’พ,๐Ÿ’ฟ,๐Ÿ“€,๐Ÿ“,๐Ÿ“‚,๐Ÿ“ƒ,๐Ÿ“„,๐Ÿ“…,๐Ÿ“†,๐Ÿ“‡,๐Ÿ“ˆ,๐Ÿ“‰,๐Ÿ“Š,๐Ÿ“‹,๐Ÿ“Œ,๐Ÿ“,๐Ÿ“Ž,๐Ÿ“,๐Ÿ“,๐Ÿ“‘,๐Ÿ“’,๐Ÿ““,๐Ÿ“”,๐Ÿ“•,๐Ÿ“–,๐Ÿ“—,๐Ÿ“˜,๐Ÿ“™,๐Ÿ“š,๐Ÿ“›,๐Ÿ“œ,๐Ÿ“,๐Ÿ“ž,๐Ÿ“Ÿ,๐Ÿ“ ,๐Ÿ“ก,๐Ÿ“ข,๐Ÿ“ฃ,๐Ÿ“ค,๐Ÿ“ฅ,๐Ÿ“ฆ,๐Ÿ“ง,๐Ÿ“จ,๐Ÿ“ฉ,๐Ÿ“ช,๐Ÿ“ซ,๐Ÿ“ฎ,๐Ÿ“ฐ,๐Ÿ“ฑ,๐Ÿ“ฒ,๐Ÿ“ณ,๐Ÿ“ด,๐Ÿ“ถ,๐Ÿ“ท,๐Ÿ“น,๐Ÿ“บ,๐Ÿ“ป,๐Ÿ“ผ,๐Ÿ”ƒ,๐Ÿ”Š,๐Ÿ”‹,๐Ÿ”Œ,๐Ÿ”,๐Ÿ”Ž,๐Ÿ”,๐Ÿ”,๐Ÿ”‘,๐Ÿ”’,๐Ÿ”“,๐Ÿ””,๐Ÿ”–,๐Ÿ”—,๐Ÿ”˜,๐Ÿ”™,๐Ÿ”š,๐Ÿ”›,๐Ÿ”œ,๐Ÿ”,๐Ÿ”ž,๐Ÿ”Ÿ,๐Ÿ” ,๐Ÿ”ก,๐Ÿ”ข,๐Ÿ”ฃ,๐Ÿ”ค,๐Ÿ”ฅ,๐Ÿ”ฆ,๐Ÿ”ง,๐Ÿ”จ,๐Ÿ”ฉ,๐Ÿ”ช,๐Ÿ”ซ,๐Ÿ”ฎ,๐Ÿ”ฏ,๐Ÿ”ฐ,๐Ÿ”ฑ,๐Ÿ”ฒ,๐Ÿ”ณ,๐Ÿ”ด,๐Ÿ”ต,๐Ÿ”ถ,๐Ÿ”ท,๐Ÿ”ธ,๐Ÿ”น,๐Ÿ”บ,๐Ÿ”ป,๐Ÿ”ผ,๐Ÿ”ฝ,๐Ÿ•,๐Ÿ•‘,๐Ÿ•’,๐Ÿ•“,๐Ÿ•”,๐Ÿ••,๐Ÿ•–,๐Ÿ•—,๐Ÿ•˜,๐Ÿ•™,๐Ÿ•š,๐Ÿ•›,๐Ÿ—ป,๐Ÿ—ผ,๐Ÿ—ฝ,๐Ÿ—พ,๐Ÿ—ฟ,๐Ÿ˜€,๐Ÿ˜‡,๐Ÿ˜ˆ,๐Ÿ˜Ž,๐Ÿ˜,๐Ÿ˜‘,๐Ÿ˜•,๐Ÿ˜—,๐Ÿ˜™,๐Ÿ˜›,๐Ÿ˜Ÿ,๐Ÿ˜ฆ,๐Ÿ˜ง,๐Ÿ˜ฌ,๐Ÿ˜ฎ,๐Ÿ˜ฏ,๐Ÿ˜ด,๐Ÿ˜ถ,๐Ÿš,๐Ÿš‚,๐Ÿš†,๐Ÿšˆ,๐ŸšŠ,๐Ÿš,๐ŸšŽ,๐Ÿš,๐Ÿš”,๐Ÿš–,๐Ÿš˜,๐Ÿš›,๐Ÿšœ,๐Ÿš,๐Ÿšž,๐ŸšŸ,๐Ÿš ,๐Ÿšก,๐Ÿšฃ,๐Ÿšฆ,๐Ÿšฎ,๐Ÿšฏ,๐Ÿšฐ,๐Ÿšฑ,๐Ÿšณ,๐Ÿšด,๐Ÿšต,๐Ÿšท,๐Ÿšธ,๐Ÿšฟ,๐Ÿ›,๐Ÿ›‚,๐Ÿ›ƒ,๐Ÿ›„,๐Ÿ›…,๐ŸŒ,๐ŸŒŽ,๐ŸŒ,๐ŸŒ’,๐ŸŒ–,๐ŸŒ—,๐ŸŒ˜,๐ŸŒš,๐ŸŒœ,๐ŸŒ,๐ŸŒž,๐ŸŒฒ,๐ŸŒณ,๐Ÿ‹,๐Ÿ,๐Ÿผ,๐Ÿ‡,๐Ÿ‰,๐Ÿค,๐Ÿ€,๐Ÿ,๐Ÿ‚,๐Ÿƒ,๐Ÿ„,๐Ÿ…,๐Ÿ†,๐Ÿ‡,๐Ÿˆ,๐Ÿ‰,๐ŸŠ,๐Ÿ‹,๐Ÿ,๐Ÿ,๐Ÿ“,๐Ÿ•,๐Ÿ–,๐Ÿช,๐Ÿ‘ฅ,๐Ÿ‘ฌ,๐Ÿ‘ญ,๐Ÿ’ญ,๐Ÿ’ถ,๐Ÿ’ท,๐Ÿ“ฌ,๐Ÿ“ญ,๐Ÿ“ฏ,๐Ÿ“ต,๐Ÿ”€,๐Ÿ”,๐Ÿ”‚,๐Ÿ”„,๐Ÿ”…,๐Ÿ”†,๐Ÿ”‡,๐Ÿ”‰,๐Ÿ”•,๐Ÿ”ฌ,๐Ÿ”ญ,๐Ÿ•œ,๐Ÿ•,๐Ÿ•ž,๐Ÿ•Ÿ,๐Ÿ• ,๐Ÿ•ก,๐Ÿ•ข,๐Ÿ•ฃ,๐Ÿ•ค,๐Ÿ•ฅ,๐Ÿ•ฆ,๐Ÿ•ง'.split(',');

const exception = [];
emojis.forEach((emoji) => {
  const match = emojiRegex().exec(emoji);
  if (!match) { exception.push(emoji) }
});
console.log('Exception length', exception.length);
console.log(JSON.stringify(exception));

6.1.0

Exception length 0

[]

6.1.3

Exception length 72

["โœ‚","โœˆ","โœ‰","โœ","โœ’","โœ”","โœ–","โœณ","โœด","โ„","โ‡","โค","โžก","โ“‚","๐Ÿ…ฐ","๐Ÿ…ฑ","๐Ÿ…พ","๐Ÿ…ฟ","๐Ÿˆ‚","๐Ÿˆท","ยฉ","ยฎ","โ€ผ","โ‰","8โƒฃ","9โƒฃ","7โƒฃ","6โƒฃ","1โƒฃ","0โƒฃ","2โƒฃ","3โƒฃ","5โƒฃ","4โƒฃ","#โƒฃ","โ„ข","โ„น","โ†”","โ†•","โ†–","โ†—","โ†˜","โ†™","โ†ฉ","โ†ช","โ–ช","โ–ซ","โ–ถ","โ—€","โ—ป","โ—ผ","โ˜€","โ˜","โ˜Ž","โ˜‘","โ˜บ","โ™ ","โ™ฃ","โ™ฅ","โ™ฆ","โ™จ","โ™ป","โš ","โคด","โคต","โฌ…","โฌ†","โฌ‡","ใ€ฐ","ใ€ฝ","ใŠ—","ใŠ™"]

Using https://github.com/Kikobeats/emojis-list as spec

6.1.0

Exception length 118

["๐Ÿ‡ฆ","๐Ÿ‡ง","๐Ÿ‡จ","๐Ÿ‡ฉ","๐Ÿ‡ช","๐Ÿ‡ซ","๐Ÿ‡ฌ","๐Ÿ‡ญ","๐Ÿ‡ฎ","๐Ÿ‡ฏ","๐Ÿ‡ฐ","๐Ÿ‡ฑ","๐Ÿ‡ฒ","๐Ÿ‡ณ","๐Ÿ‡ด","๐Ÿ‡ต","๐Ÿ‡ถ","๐Ÿ‡ท","๐Ÿ‡ธ","๐Ÿ‡น","๐Ÿ‡บ๐Ÿ‡ณ","๐Ÿ‡บ","๐Ÿ‡ป","๐Ÿ‡ผ","๐Ÿ‡ฝ","๐Ÿ‡พ","๐Ÿ‡ฟ","๐Ÿ•บ","๐Ÿ–ค","๐Ÿ—จ","๐Ÿ›‘","๐Ÿ›’","๐Ÿ›ด","๐Ÿ›ต","๐Ÿ›ถ","๐Ÿค™","๐Ÿคš","๐Ÿค›","๐Ÿคœ","๐Ÿค","๐Ÿคž","๐Ÿค ","๐Ÿคก","๐Ÿคข","๐Ÿคฃ","๐Ÿคค","๐Ÿคฅ","๐Ÿคฆโ€โ™€๏ธ","๐Ÿคฆโ€โ™‚๏ธ","๐Ÿคฆ","๐Ÿคง","๐Ÿคฐ","๐Ÿคณ","๐Ÿคด","๐Ÿคต","๐Ÿคถ","๐Ÿคทโ€โ™€๏ธ","๐Ÿคทโ€โ™‚๏ธ","๐Ÿคท","๐Ÿคธโ€โ™€๏ธ","๐Ÿคธโ€โ™‚๏ธ","๐Ÿคธ","๐Ÿคนโ€โ™€๏ธ","๐Ÿคนโ€โ™‚๏ธ","๐Ÿคน","๐Ÿคบ","๐Ÿคผโ€โ™€๏ธ","๐Ÿคผโ€โ™‚๏ธ","๐Ÿคผ","๐Ÿคฝโ€โ™€๏ธ","๐Ÿคฝโ€โ™‚๏ธ","๐Ÿคฝ","๐Ÿคพโ€โ™€๏ธ","๐Ÿคพโ€โ™‚๏ธ","๐Ÿคพ","๐Ÿฅ€","๐Ÿฅ","๐Ÿฅ‚","๐Ÿฅƒ","๐Ÿฅ„","๐Ÿฅ…","๐Ÿฅ‡","๐Ÿฅˆ","๐Ÿฅ‰","๐ŸฅŠ","๐Ÿฅ‹","๐Ÿฅ","๐Ÿฅ‘","๐Ÿฅ’","๐Ÿฅ“","๐Ÿฅ”","๐Ÿฅ•","๐Ÿฅ–","๐Ÿฅ—","๐Ÿฅ˜","๐Ÿฅ™","๐Ÿฅš","๐Ÿฅ›","๐Ÿฅœ","๐Ÿฅ","๐Ÿฅž","๐Ÿฆ…","๐Ÿฆ†","๐Ÿฆ‡","๐Ÿฆˆ","๐Ÿฆ‰","๐ŸฆŠ","๐Ÿฆ‹","๐ŸฆŒ","๐Ÿฆ","๐ŸฆŽ","๐Ÿฆ","๐Ÿฆ","๐Ÿฆ‘","โ™€","โ™‚","โš•","๎”Š"]

6.1.3

Exception length 209

["๐Ÿ…ฐ","๐Ÿ…ฑ","๐Ÿ…พ","๐Ÿ…ฟ","๐Ÿˆ‚","๐Ÿˆท","๐ŸŒก","๐ŸŒค","๐ŸŒฅ","๐ŸŒฆ","๐ŸŒง","๐ŸŒจ","๐ŸŒฉ","๐ŸŒช","๐ŸŒซ","๐ŸŒฌ","๐ŸŒถ","๐Ÿฝ","๐ŸŽ–","๐ŸŽ—","๐ŸŽ™","๐ŸŽš","๐ŸŽ›","๐ŸŽž","๐ŸŽŸ","๐Ÿ","๐ŸŽ","๐Ÿ”","๐Ÿ•","๐Ÿ–","๐Ÿ—","๐Ÿ˜","๐Ÿ™","๐Ÿš","๐Ÿ›","๐Ÿœ","๐Ÿ","๐Ÿž","๐ŸŸ","๐Ÿณ","๐Ÿต","๐Ÿท","๐Ÿฟ","๐Ÿ‘โ€๐Ÿ—จ","๐Ÿ‘","๐Ÿ“ฝ","๐Ÿ•‰","๐Ÿ•Š","๐Ÿ•ฏ","๐Ÿ•ฐ","๐Ÿ•ณ","๐Ÿ•ถ","๐Ÿ•ท","๐Ÿ•ธ","๐Ÿ•น","๐Ÿ–‡","๐Ÿ–Š","๐Ÿ–‹","๐Ÿ–Œ","๐Ÿ–","๐Ÿ–ฅ","๐Ÿ–จ","๐Ÿ–ฑ","๐Ÿ–ฒ","๐Ÿ–ผ","๐Ÿ—‚","๐Ÿ—ƒ","๐Ÿ—„","๐Ÿ—‘","๐Ÿ—’","๐Ÿ—“","๐Ÿ—œ","๐Ÿ—","๐Ÿ—ž","๐Ÿ—ก","๐Ÿ—ฃ","๐Ÿ—จ","๐Ÿ—ฏ","๐Ÿ—ณ","๐Ÿ—บ","๐Ÿ›‹","๐Ÿ›","๐Ÿ›Ž","๐Ÿ›","๐Ÿ› ","๐Ÿ›ก","๐Ÿ›ข","๐Ÿ›ฃ","๐Ÿ›ค","๐Ÿ›ฅ","๐Ÿ›ฉ","๐Ÿ›ฐ","๐Ÿ›ณ","โ€ผ","โ‰","โ„ข","โ„น","โ†”","โ†•","โ†–","โ†—","โ†˜","โ†™","โ†ฉ","โ†ช","#โƒฃ","โŒจ","โ","โญ","โฎ","โฏ","โฑ","โฒ","โธ","โน","โบ","โ“‚","โ–ช","โ–ซ","โ–ถ","โ—€","โ—ป","โ—ผ","โ˜€","โ˜","โ˜‚","โ˜ƒ","โ˜„","โ˜Ž","โ˜‘","โ˜˜","โ˜ ","โ˜ข","โ˜ฃ","โ˜ฆ","โ˜ช","โ˜ฎ","โ˜ฏ","โ˜ธ","โ˜น","โ˜บ","โ™€","โ™‚","โ™ ","โ™ฃ","โ™ฅ","โ™ฆ","โ™จ","โ™ป","โš’","โš”","โš•","โš–","โš—","โš™","โš›","โšœ","โš ","โšฐ","โšฑ","โ›ˆ","โ›","โ›‘","โ›“","โ›ฉ","โ›ฐ","โ›ฑ","โ›ด","โ›ท","โ›ธ","โœ‚","โœˆ","โœ‰","โœ","โœ’","โœ”","โœ–","โœ","โœก","โœณ","โœด","โ„","โ‡","โฃ","โค","โžก","โคด","โคต","*โƒฃ","โฌ…","โฌ†","โฌ‡","0โƒฃ","ใ€ฐ","ใ€ฝ","1โƒฃ","2โƒฃ","ใŠ—","ใŠ™","3โƒฃ","4โƒฃ","5โƒฃ","6โƒฃ","7โƒฃ","8โƒฃ","9โƒฃ","ยฉ","ยฎ","๎”Š"]
mathiasbynens commented 7 years ago

Support for sequences is not implemented yet since the rewrite. 37d8faac5a725e37bb5e9ff9531dd9b241c63ae0

paulirish commented 7 years ago

Here's the test string that me and @notwaldorf are using for testing an emoji-extraction script:

๐Ÿ‘ฉ๐Ÿฟ๐Ÿ˜Ž๐Ÿ™ˆ๐Ÿณ๏ธโ€๐ŸŒˆ๐Ÿ‘จโ€๐Ÿ‘จโ€๐Ÿ‘งโ€๐Ÿ‘ฆ

eager for sequence support. :)

mathiasbynens commented 7 years ago

Iโ€™ve updated unicode-tr51 so it now exports a list of all emoji sequences as an array of strings: https://github.com/mathiasbynens/unicode-tr51/blob/v8.1.1/sequences.js

Now we can pass that to regexgen to generate a compact regex for the sequences, and combine that with the existing regex. Iโ€™ve written the code to do this already, but ran into (what I think is) a regexgen bug: https://github.com/devongovett/regexgen/issues/10 Once thatโ€™s fixed, emoji-regex gets sequence support!

paulirish commented 7 years ago

@mathiasbynens you are such a pro. :)

mathiasbynens commented 7 years ago

๐Ÿ˜Š

roderickhsiao commented 7 years ago

Definitely a bravo for the lightening fix ๐Ÿ’ฏ

roderickhsiao commented 7 years ago

@mathiasbynens thanks for the quick fix.

I tried 6.2.0 using the same test cases, still getting the same results. probably a build/publish issue?

mathiasbynens commented 7 years ago

Ah, I see. The problematic test cases, e.g. U+1F321 THERMOMETER ๐ŸŒก๏ธ are not Emoji_Presentation characters. This means that they get rendered as text by default (according to the Unicode data files), and only get rendered as an emoji when followed by U+FE0F VARIATION SELECTOR-16. (more info)

From http://unicode.org/Public/emoji/5.0/emoji-test.txt:

1F321 FE0F                                 ; fully-qualified     # ๐ŸŒก๏ธ thermometer
1F321                                      ; non-fully-qualified # ๐ŸŒก thermometer

(On my system U+1F321 gets rendered as an emoji even without the variation selector, but of course thatโ€™s something that differs anyway depending on the fonts youโ€™re using etc. This package is purely based on the Unicode data.)

I donโ€™t really see how to move forward here โ€” should emoji-regex expose a secondary regex that matches text-emoji such as ๐ŸŒก๏ธ as well?

devongovett commented 7 years ago

As an alternative if you want to exactly match a particular font, there is apple-color-emoji which generates a regex from the actual Mac emoji font. Perhaps the same could be done for other fonts.

roderickhsiao commented 7 years ago

Good find @mathiasbynens

Probably an config like emojiRegex({config}) to return regex including non-fully-qualified list? But then we are exploding the file size if people are not using it. Or as yo mentioned, a secondary regex export?

mathiasbynens commented 7 years ago

@roderickhsiao Please try v6.3.0 + the last paragraph in https://github.com/mathiasbynens/emoji-regex#installation and let me know if that solves your use case!

roderickhsiao commented 7 years ago

it works perfectly, probably in readme we should import emojiRegex from 'emoji-regex/dist/text'; as we are not exposing dist directly :)

Thanks for quick fix!

mathiasbynens commented 7 years ago

@roderickhsiao Thanks! Fixed in v6.4.0 by removing the dist folder.

roderickhsiao commented 7 years ago

Work like a charm, thanks!

devongovett commented 7 years ago

regexgen fix for the bug mentioned above is here: https://github.com/devongovett/regexgen/pull/14. Released in v1.2.3. ๐ŸŽ‰