tc39 / proposal-regex-escaping

Proposal for investigating RegExp escaping for the ECMAScript standard
http://tc39.es/proposal-regex-escaping/
Creative Commons Zero v1.0 Universal
368 stars 32 forks source link

Control Character Escapes #36

Closed benjamingr closed 1 year ago

benjamingr commented 9 years ago

Checking interest in escaping the whole A-Za-z range at the start of escaped strings in order to support ControlCharacter escapes:

> new RegExp('\\cJ').test('\n') // true
> new RegExp("\\c" + RegExp.escape('J')); // matches "\n" but not the string "\cJ"

Are we interested in these escaped? Personally I never even knew these were a thing before, let alone in scenarios where .escape would be used. I definitely see the appeal for safety though.

Summoning @mathiasbynens @anba who are knowledgable on the topic, @bergus and @nikic who led hex escapes and @allenwb @cscott and @domenic for the spec's PoV on the subject.

mathiasbynens commented 9 years ago

I agree with @anba: if you decide to prevent RegExp('\\u004' + RegExp.escape('A')) from expanding to RegExp('\\u004A'), then we should be consistent and apply the same logic to control character escapes.

bergus commented 9 years ago

I agree - if we do one, then we should do the other. However I'm leaning towards the position that we should do neither. Can we get some data whether this is used anywere? A \c followed not by [A-Z], a \u not followed by {…} or 4 hexadecimal digits, a \x not followed by 2 hexadecimal digits? In any expression? I suspect this "feature" is not used anywhere. We might even go so far to propose to TC39 that this usage should be forbidden in all expressions, not just those with the /u flag.

benjamingr commented 9 years ago

@bergus any idea on how to collect data? I have not found \\c inside the RegExp constructor or \c to be common by a shallow search but good data would go a long way here.

ljharb commented 1 year ago

The spec has been updated with the escaping semantics that advanced to stage 2 today.