mathiasbynens / emoji-test-regex-pattern

A regular expression pattern for Java/JavaScript to match all emoji in the emoji-test.txt file provided by UTS#51.
MIT License
98 stars 17 forks source link

Add `javascript-v.txt` #15

Open mathiasbynens opened 2 years ago

mathiasbynens commented 2 years ago

Once both https://github.com/tc39/proposal-regexp-set-notation AND #7 happen, we should export javascript-v.txt, a JavaScript-compatible regular expression pattern matching all the emoji in index.txt, for use in regular expressions with the v flag.

This pattern could be more concise than the u pattern, but would still require programmatically generated character class ranges.

The final pattern looks like this:

\p{Emoji_Test}|other|emoji|here
mathiasbynens commented 2 years ago

Once both https://github.com/tc39/proposal-regexp-set-notation AND #7 happen

Before that, once only https://github.com/tc39/proposal-regexp-set-notation happens, we can already provide a more optimal output flavor, since we could then rely on \p{RGI_Emoji} being supported in browsers, and thus subtract RGI_Emoji from the set of sequences we need to generate a pattern for. The pattern would look something like this:

\p{RGI_Emoji}|other|emoji|here

We could even make use of \q{other|emoji|here} if that ends up saving bytes compared to the current character class-based pattern.

Similarly to the above, we’d need to look at the oldest possible Emoji standard supported across all browsers that support the v flag.

mathiasbynens commented 2 years ago

javascript-v is now a build target. The only way to simplify this further is to standardize a new Unicode property of strings that exposes the emoji-test.txt data, as described in the top post. I’ll leave the issue open to track that.