mathiasbynens / emoji-regex

A regular expression to match all Emoji-only symbols as per the Unicode Standard.
https://mths.be/emoji-regex
MIT License
1.72k stars 175 forks source link

Regex does not match many objects from Apple emoji palette, but does match the same emojis from Android #68

Closed kenpowers-signal closed 2 years ago

kenpowers-signal commented 4 years ago

There are several emojis which can be inserted from the iOS / macOS emoji pickers which are not recognized by the regex provided by this library, but the same emojis inserted from the Android emoji picker are recognized.

const emojiRegex = require("emoji-regex");
const regex = emojiRegex();

const ios = ['⏱', '⏲', '🕰', '⌛️', '⏳', '🎛'];
const android = ['⏱️', '⏲️', '🕰️', '⌛', '⏳', '🎛️'];

console.log({ ios: ios.map(e => regex.test(e)), android: android.map(e => regex.test(e)) });

Runkit output:

Object {android: [true, false, true, false, true, false], ios: [false, false, false, true, false, false]}

I haven't dug into the unicode to see what's happening just yet.

gilmoreorless commented 4 years ago

I've noticed previously that Apple has a loose conformance with the Unicode spec around emoji characters — specifically which ones do and don't need the special U+FE0F presentation selector appended. See https://github.com/mathiasbynens/emoji-regex/issues/28 for a discussion about this problem. Perhaps Android has the same non-conformance problem, but in a different way.

Regarding your specific example, there's definitely a difference in the output between the platforms, even though they have the same visual appearance on my machine.

const listCodePoints = (arr) =>
  arr.map(
    (e) => [...e].map(
      (cp) => `U+${cp.codePointAt(0).toString(16).toUpperCase()}`
    ).join(' ')
  );

console.log(listCodePoints(ios));
// [ "U+23F1", "U+23F2", "U+1F570", "U+231B U+FE0F", "U+23F3", "U+1F39B" ]

console.log(listCodePoints(android));
// [ "U+23F1 U+FE0F", "U+23F2 U+FE0F", "U+1F570 U+FE0F", "U+231B", "U+23F3", "U+1F39B U+FE0F" ]
scottnonnenberg-signal commented 4 years ago

@gilmoreorless Does it make sense to make emoji-regex a little looser to allow for this?

gilmoreorless commented 4 years ago

@scottnonnenberg-signal It does make sense as a potential variation. I pondered about a "loose" variant in https://github.com/mathiasbynens/emoji-regex/issues/33#issuecomment-374176872, but that was about a slightly different problem. The simple answer is no-one has yet done the work to add one — @mathiasbynens pointed out it's not a straightforward task in https://github.com/mathiasbynens/emoji-regex/issues/28#issuecomment-323044429.

mathiasbynens commented 3 years ago

The best long-term solution is for Apple to respect the Unicode Standard and not deviate from it. In recent macOS updates it seems like emoji input has improved in terms of spec compliance, so I'm hopeful.

@kenpowers-signal Do you have an up-to-date iOS device handy? Could you try inputting those emoji again on the latest iOS? I wonder if the variation selectors are still missing.

jakob11git commented 3 years ago

iOS 14.2.1 ⏱⏲🕰⌛️⏳🎛 macOS 11.0.1 emoji picker ⏱⏲🕰⌛️⏳🎛 macOS 11.0.1 Japanese IME (without control knobs cause I don't know how to input) ⏱⏲🕰⌛︎⏳

Apparently all the same as earlier iOS.

mathiasbynens commented 2 years ago

v10.0.0 now leverages emoji-test-regex-pattern which has a dedicated list of emojis that Apple's iOS emoji picker enters in overqualified form: https://github.com/mathiasbynens/emoji-test-regex-pattern/blob/89818e015d94a8d31c7fe30444f9ac7030908f14/script/get-sequences.js#L1-L48 Please try v10.0.0 and see it it helps.