Should we support Unicode escapes in group names, RegExp style?

tc39 / proposal-regexp-named-groups

Named capture groups for JavaScript RegExps

https://tc39.github.io/proposal-regexp-named-groups/

222 stars 21 forks source link

Should we support Unicode escapes in group names, RegExp style? #23

Closed littledan closed 7 years ago

littledan commented 7 years ago

E.g., see these tests in V8: https://cs.chromium.org/chromium/src/v8/test/mjsunit/harmony/regexp-named-captures.js?q=regexp-named+package:%5Echromium$&l=86 . These should be supported by the standard, as it's analogous to identifiers and properties.

schuay commented 7 years ago

Is it worth complicating the spec and implementations to support a feature that, most likely, no-one will use?

Is there even a valid use-case for something like /(?<\u{03C0}>a)/?
How would this interact with unicode mode?
Would they also be valid in replacer string capture references? V8 doesn't do this currently and it would complicate things significantly.

+1 from me for leaving the spec as-is and removing existing support from V8.

mathiasbynens commented 7 years ago

I agree with @schuay. The benefits of matching Identifier do not outweigh the added complexity. Let’s keep it simple for now — we can reconsider allowing this in the future if there is a strong need for it.

schuay commented 7 years ago

Dan pointed out that escape sequences in replacer strings would already be handled by regular string literal syntax.

littledan commented 7 years ago

Actually, I think this already falls out of the current semantics of the specification. The syntax is based on IdentiferName, and taking the StringValue of that production. Both of these will explicitly allow Unicode escapes.

An issue with the current spec, though, is that we use UnicodeEscapeSequence rather than RegExpUnicodeEscapeSequence. For example, \u{1234} is allowed only with Unicode mode turned on, and disallowed with it turned off.

It seems surprising and a little overly complicated to not follow the RegExp syntax for Unicode escapes when in a RegExp. So I think we should split out the grammar for identifiers here, unless we want to fully remove the feature. If we do this splitting out, we need to make sure to maintain the errors for bad identifiers.