gskinner / regexr

RegExr is a HTML/JS based tool for creating, testing, and learning about Regular Expressions.
http://regexr.com/
GNU General Public License v3.0
9.86k stars 971 forks source link

unexpected error on `/\p{Script=Han}/gu` #485

Open mesqueeb opened 1 year ago

mesqueeb commented 1 year ago
image

even though it matches fine: image

verhovsky commented 9 months ago

The issue is that there's a difference between JavaScript and PCRE syntax in these \p values and Regexr just parses them like PCRE, even when you tell it it's JavaScript.

PCRE allows

\pL
\p{L}
\p{Han}

Whereas JavaScript allows

\p{L}
\p{Script=Han}
\p{Letter}

(note it doesn't allow \pL)

In PCRE, some text by its own is treated as a Script/Script_Extension/General Category (and if it's not a matching possible value in any of the 3 then that's an error) and in JavaScript some text by its own is just as the General Category (and if it's not a valid General Category, i.e. \p{Han} is a Script not a General Category, then that's an error) so it has the extra Script=Han syntax which PCRE doesn't have, so the parser needs to know if it's parsing JavaScript or PCRE.

The error happens in here

https://github.com/gskinner/regexr/blob/1e382719041f8b1e5290472e14e2c98a6c05b61a/dev/src/ExpressionLexer.js#L621-L642