Closed Alhadis closed 1 year ago
The RegExp.escape
solution would produce a string, not a regular expression. Only the RegExp.tag
solution would produce an entire regex.
Can you elaborate on the use cases for needing a string?
Can you elaborate on the use cases for needing a string?
I listed two use-cases in the OP (underlined below):
It should be possible to escape metacharacters without instantiating a RegExp. Writing RegExp.escape(…).source isn't sufficient, because it assumes the result is a well-formed ECMAScript regular expression. Users wanting to construct regexes piecemeal, or those forced to use string values to accommodate foreign regex syntaxes (e.g., Oniguruma/TextMate-compatible grammars), are left with two partial workarounds:
Are you pressing for actual, working code samples exemplifying the aforementioned use cases?
The proposal currently has a lot of pushback against the form that returns a string, so it's particularly valuable to have justifications to prefer the string form.
Why would you want to construct a regex piecemeal? (i don't think RegExp would need to concern itself whatsoever with nonstandard grammars).
Nowhere did I imply that the proposed semantics ought to change, only that
It should be possible to escape metacharacters without instantiating a
RegExp
A solution could take the form of
RegExp.escape(input, asString = false)
RegExp.escapeString()
Right, but "it should" is a claim, and I'm asking for justification of that claim. Why should it be possible?
Why should it be possible?
Imagine constructing several similar (but distinct) regular expressions that share common subpatterns. Some of these patterns are generated from user-supplied input, but also contain actual RegExp syntax. E.g.,
// User-supplied arguments
let titleArgument = "foo";
let searchQuery = "Something that appears inside a search result";
// Program logic
const title = `(?<title>${ RegExp.escape(titleArgument) })`;
const desc = `(?<description>.*?${ RegExp.escape(searchQuery) }.*?)`;
const dash = `\\s*[-–—⸻]\\s*|\\s+`;
const shortSearchResult = new RegExp(`^\\s*${title}${dash}${desc}$`, "gm");
const longSearchResult = new RegExp(`^\\s*${title}\n${desc}\n{2,}`, "g");
…
If this still doesn't strike you as ample justification, I invite you to rewrite the above using code you feel is more appropriate.
@benjamingr Any feedback?
Hmm. With the tag approach, you'd do the escaping once for each regex, like this:
const shortSearchResult = RegExp.tag`/^\\s*(?<title>${titleArgument})\\s*[-–—⸻]\\s*|\\s+(?<description>.*?${searchQuery}.*?)$/gm`;
const longSearchResult = RegExp.tag`/^\\s*(?<title>${titleArgument})\n(?<description>.*?${searchQuery}.*?)\n{2,}/g`;
I'm not sure how strong an argument it is that some of the pieces of two different regexes wouldn't have to be repeated.
@Alhadis This issue has been open for less than a day. I want to think about this :]
(Thank you for opening an issue and bringing up a concern!)
Intuitively: what you are describing is the "current" RegExp.escape
proposal (it returns a string). I am personally a fan of it and think it's the simplest API as well as the one every other programming language I checked picked.
That said, I am honestly very open to alternatives - the fact other languages picked this API doesn't mean it's the best one - so I want to think about this more.
@ljharb the issue is composing these, notice how you ended up inlining title
and desc
and dash
? I think that's the point @Alhadis is trying to make. In order for the code to be similar in terms of composition - .source
would have to be used - which would make RegExp.tag
potentially unsafe. Intuitively this is actually a pretty important point.
Also to clarify @Alhadis - while I work on this proposal the opinion that counts here in terms of what ends up being done is the committee's. I just wanted to make that clear.
I intend to keep working on this and am open to suggestions/ideas but am currently strongly in favour of RegExp.escape
(and not tag) but I don't call the shots here.
The person championing this proposal from TC39 is @ljharb (and in the past Domenic who helped a bunch). You can read more on how features get added to JavaScript here.
I agree, but I think the hard part would be persuasively arguing that it's appropriate to build regular expressions via composition.
persuasively arguing that it's appropriate to build regular expressions via composition.
@ljharb Maybe for you, but I doubt you work with regular expressions as frequently and heavily as I do. Trust me on this.
That said, I am honestly very open to alternatives
@benjamingr What about RegExp.from()
? If passed a string, it could return a RegExp
that matches the argument verbatim. I remember seeing .from()
mentioned in another proposal somewhere, so it might be out-of-scope for this proposal.
@Alhadis it's not just about my trust; it's that all of the TC39 delegates have to be convinced of this too. Either way, a persuasive argument is one that can be conveyed to folks that aren't as familiar with the subject - if it requires the familiarity to understand the argument, it's a much harder thing to convince folks of the necessity of.
Alright, but you mentioned here that
I'm not sure how strong an argument it is that some of the pieces of two different regexes wouldn't have to be repeated.
which, yes, is a pretty weak argument for the contrived example I slapped together. 2 patterns isn't the sort of problem I had in mind, though.
Try to imagine 5 or 10 patterns, possibly returned from different class methods, which nobody in their right-mind wants to keep in-sync manually with copy+pasta (D.R.Y. and all that…)
Moreover, the term "escape" has a pretty lucid meaning among coders, who probably don't expect a function of that name to also instantiate a RegExp
as well. It'd be a bit like encodeURI()
returning a URL
object instead of an encoded string: it's doing more than what it advertises, which strikes me as a poor design decision.
Sure, which is why the tag function form wouldn't be named "escape". RegExp.escape
would only ever produce a string.
IMHO, that's an even worse approach. You're naming a method after its input, not its function. Even with an awareness of template-tagging, I'd read RegExp.tag
and think of something related to HTML tags (or IETF language tags, or... really anything that describes what (and not how) the method is used.
It also makes no sense to mandate template literals over other types of strings. Sure, it might be more expressive, but that's no reason to make it so specific.
Personally, I'd call it this:
RegExp.for(characters);
Which can be read as:
RegExp.for(/* matching these */ characters);
My second choice is RegExp.from()
, but only because it's consistent with other class methods whose names begin with from…
(Object.fromEntries
, Array.from
, et cetera). There's also also the vague connection that each of these methods takes a different object type, transforms or interprets it in some fashion, and returns instance of itself. Though I admit this is a pretty weak rationale—hence I'm leaning strongly in favour of RegExp.for()
.
I would prefer that we focus on the compelling use case for composing strings as regular expressions rather than bikeshedding the name of RegExp.build/tag
or RegExp.escape
.
I think proposals for alternatives are welcome but should probably go in a separate thread.
I'm a bit confused, maybe someone could clarify. As I understand the proposal, RegExp.escape()
does return a string already, right? It's a static method, so you're not instantiating a RegExp to do this.
Is there a change to the proposal where the result of RegExp.escape()
is a RegExp object?
The readme is pretty self explanatory. It is a string https://github.com/tc39/proposal-regex-escaping#regexpescape-function
RegExp.escape
, which returns a string, advanced to stage 2 today.
It should be possible to escape metacharacters without instantiating a
RegExp
. WritingRegExp.escape(…).source
isn't sufficient, because it assumes the result is a well-formed ECMAScript regular expression. Users wanting to construct regexes piecemeal, or those forced to use string values to accommodate foreign regex syntaxes (e.g., Oniguruma/TextMate-compatible grammars), are left with two partial workarounds:[...string].map(char => RegExp.escape(char).source).join("")
Nobody wants to resort to either of those things, but there doesn't appear to be a better way (that won't throw a
SyntaxError
, that is).(I originally brought this up in #50, but probably put too much emphasis on the need to escape individual characters, as opposed to having more control over the escaped result. I've rewritten this to be clearer)
/cc @ljharb