mathiasbynens / emoji-test-regex-pattern

A regular expression pattern for Java/JavaScript to match all emoji in the emoji-test.txt file provided by UTS#51.
MIT License
102 stars 17 forks source link

Add `javascript-v` output #21

Closed mathiasbynens closed 2 years ago

mathiasbynens commented 2 years ago

This generates a new pattern optimized for use with the upcoming v flag in JavaScript regular expressions. The pattern has the following format:

\p{RGI_Emoji}|other|emoji|here

Since we no longer have to directly embed any RGI_Emoji into the pattern, the overall pattern becomes much more compact:

 12909 dist/latest/javascript-u.txt
 11631 dist/latest/javascript.txt
  3146 dist/latest/javascript-v.txt

In other words, leveraging the v flag + RGI_Emoji reduces pattern size from 11,631 to 3,146 bytes, saving 8,485 bytes or -73%.

To be on the safe side, we look at the oldest possible Emoji standard supported across all browsers that support the v flag, and use the set of RGI_Emoji strings that corresponds to it. Since there are no v flag implementations yet, this version is currently assumed to be Emoji 14. This means that, in the future, the generated pattern might start to contain hardcoded strings that are technically redundant for that given version of the Emoji standard — but this is done intentionally so that the pattern still behaves equivalently on older browsers where \p{RGI_Emoji} resolves to an older Emoji version.

More info on the RegExp v flag proposal: https://github.com/tc39/proposal-regexp-set-notation

Issue: #15

mathiasbynens commented 2 years ago

8e73efecdf88113a22a6914e5d33663e51269d07 lowered the LCD version to Emoji 13 (i.e. the oldest version supported by this project), since it’s required to ensure the older patterns behave correctly.