gskinner / regexr

RegExr is a HTML/JS based tool for creating, testing, and learning about Regular Expressions.
http://regexr.com/
GNU General Public License v3.0
9.71k stars 959 forks source link

Extended Unicode escape doesn't seem to accept hexadecimal digits. #476

Open AshtonSnapp opened 1 year ago

AshtonSnapp commented 1 year ago

Hello. I am trying to enter the following Regex into RegExr in an effort to debug a potential problem with it that has caused a lexer to pick up a trailing parentheses after the closing quotation mark (said parentheses being the end delimiter of function arguments).

The regex I am trying to debug is as follows: /"([\u{0}-\u{10FFFF}]|(\\"))*"/gu (although, it is written in my code as a raw string literal r#""([\u{0}-\u{10FFFE}\u{10FFFF}]|(\\"))*""# - a bug in the lexer library I'm using causes \u{0}-\u{10FFFF} to match any byte, hence the slight weirdness there)

However, attempting to type in the \u{10FFFF} escape results in RegExr failing to identify the escape sequence - it gets marked as invalid. I am using the JavaScript (Browser) regex engine for this, because Unicode. This appears to be a bug, as the sidebar reference indicates that any number of hexadecimal digits may be used within the brackets. Using lowercase F's does not work either.

valadaptive commented 6 months ago

I'm also experiencing this--note that while the parser rejects the character escape, the regex itself works properly:

image