Closed mathiasbynens closed 3 years ago
I’ve managed to reduce this down to the following test case showing an issue with regexgen
: https://github.com/devongovett/regexgen/issues/31#issuecomment-800233366
Filed it here: https://github.com/devongovett/regexgen/issues/31
As a workaround, we can try sorting the patterns on our end before passing things to regexgen.
Hrmm that's weird. I remember adding sort-by-sequence length to emoji-regex
specifically to work around regexgen
internals, because it seemed to better handle longer strings first: https://github.com/mathiasbynens/emoji-regex/pull/21
Now it looks like that's not true any more. (I think the real issue here is that the full emoji spec is a stress test well outside what regexgen
was designed for. 😄)
Turns out that .sort()
ing before passing to regexgen doesn't actually fix the issue, it just moves it around (other strings are now no longer matched). Something is wrong with regexgen.
I think the real issue here is that the full emoji spec is a stress test well outside what regexgen was designed for. 😄
You’d think so! I was assuming it must be some weird bug with astral symbols, or something, but as it turns out, it reproduces with ASCII-only input strings as well: https://github.com/devongovett/regexgen/issues/31#issuecomment-800233366 So it’s “just” a bug in the minimizer.
We could explore using an alternate dependency, such as this Rust library: https://github.com/pemistahl/grex by @pemistahl It started as a regexgen port, but perhaps this issue was fixed along the way? I'll investigate this week.
Update: grex has the same issue: https://github.com/pemistahl/grex/issues/31
I’ll close this issue as we’re working around it, which is good enough for now.