Closed Alhadis closed 3 years ago
I don't have any interest in expanding the problematic design of RegExp by adding another Symbol lookup, and I suspect that is an opinion shared by many on the committee.
Separately, it wouldn't make any sense to me to have an instance method that doesn't actually care about the instance except to look something up on the constructor.
A static method - whether a template tag or a .escape
function - allows for the same customizability with RegExpSubclass.escape
or RegExpSubclass.tag
, or similar. Code wishing to support the exceedingly rare design pattern of regex subclasses can regex.constructor.escape
as needed.
I don't have any interest in expanding the problematic design of RegExp by adding another Symbol lookup, and I suspect that is an opinion shared by many on the committee.
Forget about Symbol lookups then, what about returning a string with escaped metacharacters? Moreover, bindings to a third-party library typically take strings as arguments, and their syntax is rarely compatible with standard regular expressions (think of TextMate grammars, which are commonly powered by Oniguruma).
exceedingly rare design pattern of regex subclasses
Needing to escape a subset or superset of "special" regex characters isn't "exceedingly rare". Subclasses were only used as an example.
The very least you can do is add an optional parameter to specify characters to exclude from escaping.
Are there existing userland patterns in JS you could point to where there's been a need to customize the escaped character list?
What are you referring to by "userland", exactly?
The crux of the issue is there's no way to safely return a string that's escaped consistently with RegExp.escape
. Unescaping certain sequences can always come afterwards, I suppose.
I mean, outside the language - typically, a package on npm in common usage.
I wouldn't know. It's been years since I've used NPM (or any other package manager) for anything other than globally-installing a command-line tool, so whatever flavour-of-the-month is doing its rounds in the ecosystem at the moment is completely unknown to me.
Then in the absence of any demonstrated need for this pattern, and given the reasons I've outlined above, I'll close this for now.
(Sorry if this is written like an essay; I developed tunnel-vision halfway through writing it…)
Overview
Authors may need to escape a string for piecewise construction.
RegExp.escape(…).source
is insufficient, because input may not necessarily be a complete, syntactically valid regular expression. Ergo, I suggest providing an instance method that returns an escaped string following the same logic asRegExp.escape
:Rationale
The reason I suggest adding an instance method (as opposed to another class method) is so authors can fine-tune how/where characters are escaped (possibly influenced by a well-known
@@escape
symbol, à la@@replace
).The definition of
RegExp.prototype.escape
is more-or-less along the lines of:Motivation
Subclasses of
RegExp
may have different expectations about what characters need escaping (and where). A realistic example is a third-party regular expression library imported as a set of functions, which are wrapped inside a subclass for more idiomatic (object-oriented) use.Some actual code might make this clearer…
Example 1: Oniguruma
Oniguruma uses `&&[…]` to denote an [intersection range][RE] within a character class, meaning that `[a-z&&[aeiou]]` has two different interpretations depending on the engine that's parsing it. ~~~js class OnigurumaExpr extends RegExp { escape(input){ input = RegExp.prototype.escape(input); return input.replaceAll("&&", "\\&&"); } } /** Return true if input contains an alphabetic character. */ function hasAlphaChars(input, additionalLetters = ""){ return new OnigurumaExpr(`[A-Z${ OnigurumaExpr.escape(additionalLetters) }a-z]+`).test(input); } hasAlphaChars("Café", "éñøüğȟ"); // Harmless hasAlphaChars("Café", "&&[^a-z]"); // Problematic ~~~Example 2: Basic POSIX regular expressions (BREs)
In legacy POSIX syntax, `\(…\)` and `\{…\}` have *opposite* meanings to `(…)` and `{…}`, respectively. ~~~js class BRE extends RegExp { [Symbol.escape](input){ return input.replace(/\\[({})\\1-9]/g, "\\$&"); } } BRE.prototype.escape("\\(A\\)-(Z)+?") === String.raw `\\(A\\)-(Z)+?`; ~~~