Closed littledan closed 1 year ago
That was the expressed preference 5 years ago, and in that time, no delegate nor anybody in userland seems to have developed a template tag, as a proposal or a library, and userland seems quite convinced that the existing solution does already solve the user-facing problem.
Do you have an example of a library that attempts to solve this? If there are none, I think that does serve as evidence that it would not, in fact, add any value. If there are some, it'd be great to learn about them.
Well, I think we should encourage people who want to do work in TC39 to develop this. I think the only evidence we have right now is that it hasn't been popularized yet; we're allowed to go beyond what popular libraries do.
Indeed, but the problem that needs solving - and has needed solving for much longer than the 5 years since this proposal was rejected - has been consistently and completely solved by the low-level feature contained in this proposal. No need has manifested for anything more.
Yes, I agree that this problem is extremely important to solve.
History of the tag vs escape quarrel: Present the proposal again?, Results of TC39 presentation, specifically comments such as this, this, this, this, this, this
For what it's worth from me: We currently have a working solution. One in the hand vs two in the bush.
I would prefer if, rather than pursuing a low-level RegExp.escape feature, we would work on a high-level RegExp templating feature.
Talking to users, we have found that almost everyone wants a RegExp.escape
for the simple cases and no one asked for RegExp.tag
, users I've spoken with found .tag
very confusing as well.
It would be great if the proposal's README summarized these explanations. I was unaware of some of this context. I would be especially interested in hearing about developers who are confused by RegExp.tag
--this is news to me.
To be clear, I'm not opposed to RegExp.escape
. I was trying to motivate people to work on a higher-level construct. You've given good context for explaining why that's not a good idea. It's more important to solve this problem one way or another, than in the way that I aesthetically prefer.
@littledan this proposal is 5 years old. The agenda item is not a proposal, it’s a discussion topic, during which context will be provided. Once this proposal is being actually proposed, then it will contain sufficient context in its documents.
I suggest you add an FAQ topic to explain the context here as well, to help people get prepared for the meeting and avoid repeated questions along the lines I made.
I've made a PR here: https://github.com/benjamingr/RegExp.escape/pull/46 just to start a FAQ, contributions welcome.
Do you have an example of a library that attempts to solve this? If there are none, I think that does serve as evidence that it would not, in fact, add any value. If there are some, it'd be great to learn about them.
I know of at least XRegExp.build.
Wow 9 years ago https://github.com/slevithan/xregexp/commit/47b338da4a03eba60c4b6db22a1adde9d14900bf - also CC @LeaVerou
Edit: also relevant: https://github.com/slevithan/xregexp/issues/103
Edit2: Note that XRegExp.tag
does not actually perform context sensitive escaping - it just calls .escape
on every substitution.
It would be really interesting (but likely impossible) to know how much of xregexp usage is .build
.
I think that having both a RegExp.escape
method that tries to handle as many escaping scenarios as possible and a RegExp.literal
method providing a simple interface for interpolation would be the best route.
RegExp.escape
could possibly be done by having a second argument specifying the regex up until the string being escaped, and if that argument is null
or missing it would escape for all possible contexts.
@ljharb just a note - xregexp.build
doesn't actually do what the suggested thing here does.
@leo60228 yeah and to be clear an .escape
proposal doesn't stop any future .tag/.build
efforts at all.
This well-intentioned but entirely wrong idea derailed this proposal 5 years ago and we ended up without a solution at all. Please let's not do this again.
@phpnode the way the stage process works is that the committee considers alternatives. I think the best path forward is to make a good faith effort to consider alternatives.
The more I talk to users and language committees of other languages the more I lean towards .escape
and not a tagged-template - but I am one (relatively inexperienced in language design) developer. By the way note that the two APIs are not mutually exclusive and nothing prevents .tag
to land at some point in the future anyway.
I am frustrated by how things went 5 years ago, given the amount of work I put into this proposal back then. I think a better approach back then would have been a longer and more patient dialog with TC39 to understand how to push this* forward.
Worst case (for people who need RegExp.escape
as the API) - if the committee decides against .escape
then WHATWG members I spoke with were very optimistic about our ability to add this to the web at which point I will also add it to Node.js. It would be unfortunate to get another cross-platform web API like AbortController
rather than something that's part of the language - so I feel strongly that we should explore TC39 standardisation first and do the hard work of getting buy-in from delegates.
*this - an API for escaping regular expressions, not necessarily RegExp.escape
or a tagged template.
Do you have an example of a library that attempts to solve this? If there are none, I think that does serve as evidence that it would not, in fact, add any value. If there are some, it'd be great to learn about them.
Another example: https://runkit.com/tolmasky/templated-regular-expression-example
const { queryString } = templateRegExp(
{
key: /[^=#]+/,
value: /[^&#]+/,
parameter: /${key}=${value}/,
queryString: /\?(${parameter})(?:&(${parameter}))*/
});
queryString.test("?a=b&c=d")
I'm not sure how that's a demonstration of escaping? That appears to be just a way to create a RegExp from objects (I also have one from ±6 years ago I don't maintain here: https://github.com/benjamingr/js-restructure :)).
Is this it? https://www.npmjs.com/package/templated-regular-expression
Yes, that's the package. And it's not intended as a demonstration of escaping, but rather another response to the "Do you have an example of a library that attempts to solve [high-level RegExp templating]?" question above regarding this specific issue, which @littledan raised to "solve the user-facing problem more directly, avoiding the need to concatenate RegExps".
Letting users compose RegExps would be awesome. The RegExp syntax are optimized for small searchers, but the regular formalism makes it perfect for lexers, and that use case is very poorly served by literals. Likewise, backtracking makes sense for small searchers, but you can easily run into ReDOS when writing non-trivial parsers.
For previous art, there's alsoThere's also https://github.com/trusktr/regexr by @trusktr.
I have a personnal preference for JS-based combinators, as seen in compose-regexp. The compose-regexp
functions accepts either string or RegExps as arguments, and strings are escaped. For completeness, it would require a charset builder with set operations.
The ${...}
syntax for splicing in values doesn't cut it in RegExp context IMO. as both $
and {}
already have meanings in RegExp context. Also, how would one interpret splicing values in character class context? That use case would be better served by a dedicated JS API IMO.
Alternatively one could imagine a syntax extension in RegExps for splicing in full patterns (the point of the regular formalism is composition, after all, that is at the core of the definition). Something like
const p1 = RegExp.quantifier("{4}", /\w/)
const p2 = RegExp.intersect(/\p{sc=Latin}/u, /\p{Uppercaser}/u)
const combined = /^(?:(?>p1)|(?>p2))$/
// compare with
const tCombined = RegExp.from`^$(?:{p1}|${p2})$`
// or with combinators
const cCombined = sequence(/^/, either(p1,p2), /$/)
Such a syntax would prevent splicing in character class and {}
quantifier context at parse time. These use cases would be better served by a dedicated JS API as shown in the example above.
Edit, a bit rusty :-)
My opinion: A template tag would be great to have but deserves its own proposal and is orthogonal to an operation for escaping.
This is my template tag library: https://github.com/rauschma/re-template-tag
A template tag would nicely complement the upcoming extended /x
mode. Quoting the proposal: “While the x-mode flag can be used in a RegularExpressionLiteral, it does not permit the use of LineTerminator in RegularExpressonLiteral. For multi-line regular expressions you would need to use the RegExp constructor.”
I’m wondering if RegExp
could play double duty as a template tag (it would have to check whether its first argument is an Array or not):
const RE_DATE = RegExp`/^${RE_YEAR}-${RE_MONTH}-${RE_DAY}$/u`;
another template tag library: https://github.com/mikesamuel/regexp-make-js
Syntax aside, when composing, one has to take what's spliced in into account, or risk getting nonsensical results. Here's an example:
const ab = /a|b/
const c = /c/
import {re} from 're-template-tag';
console.log(re`${ab}${c}`) // /a|bc/
import {sequence} from "compose-regexp"
console.log(sequence(ab, c)) // /(?:a|b)c/
I will object to any proposal that creates a new vulnerability to injection attacks. AFAICT, the means the only acceptable proposal is one based on tagged template literals.
@mikesamuel and I wrote and championed the tagged template literal proposal. We got them added to JS primarily to support writing context aware SAFE escapers of embedded languages. This is exactly the situation we're talking about here. RegExp is precisely such an embedded language, and safe escaping cannot be done in this language without context sensitivity.
At https://github.com/tc39/proposal-regex-escaping/issues/45#issuecomment-1379085585 , @cben links to
https://github.com/mikesamuel/regexp-make-js
This is the one I want to call everyone's attention to. It was written by @mikesamuel with substantial input from me, explicitly for the purpose of doing SAFE context aware escaping. It also accepts RegExp instances as substitution values, expressing nested matchers, not just nested data. This enables powerful composition of RegExps.
I would support a proposal based on this implementation, that preserves both its safety and its compositionality. I would also support a proposal that rejects in more odd contexts, rather than bothering to do context aware escaping for those odd contexts.
@erights could you give an example of an injection attack enabled by a non-ttl solution ?
E.g. how would you craft an attack on compose-regexp?
Edit: looking at the issues over at https://github.com/mikesamuel/regexp-make-js, I see a problem with sequence(/a{/, 1, /}/
, which currently returns /a{1}/
rather than /a{(?:)1}/
. It can be patched easily (edit2: done as of v0.6.31... | edit3: as of v0.7.0 lone quantifier brackets in input are rejected as syntax errors).
Edit4: @erights beside quantifier brackets and back references, do you have other "weird" scenarios in mind?
This proposal advanced to stage 2 with RegExp.escape
semantics. https://github.com/tc39/proposals/commit/cc2b37dcdbe378539d54b6901577ec5c2888fd3d
I would prefer if, rather than pursuing a low-level
RegExp.escape
feature, we would work on a high-level RegExp templating feature. This version would solve the user-facing problem more directly, avoiding the need to concatenate RegExps in the result, and can help engines avoid re-parsing (by just parsing the shared part once). If any part of RegExp syntax ends up requiring context-dependent escaping, a template constructor could resolve that in a way that context-independentRegExp.escape
cannot.Such a feature could look like this, to find the value of the variable
x
followed by the value ofy
with arbitrary whitespace in between:Here, flags are provided after a
/
within the RegExp (which is of course optional).