Closed gibson042 closed 4 months ago
The explainer argues that the output of RegExp.escape
is not put "in a place where it would obviously mean something else". See also #17 and #29.
cc @erights and @bakkot for thoughts?
I'm inclined to say that the context after \c
is similar to the context after \x
, in that if you see either \x${v}
or \c${v}
you should know that the output of v
is at least potentially going to be used as part of the escape sequence. There is no other reason to write those strings.
It's not like \1${v}
where the \1
is a coherent thing to write on its own, such that you'd be surprised for the ${v}
to become part of the escape sequence.
That matches my intuition here.
I'm inclined to say that the context after
\c
is similar to the context after\x
, in that if you see either\x${v}
or\c${v}
you should know that the output ofv
is at least potentially going to be used as part of the escape sequence. There is no other reason to write those strings.It's not like
\1${v}
where the\1
is a coherent thing to write on its own, such that you'd be surprised for the${v}
to become part of the escape sequence.
\x
and \c
are in fact coherent on their own or whenever not followed by content that completes an escape, although I agree that the intent of either is generally (and pretty much exclusively) to express such an escape. But I would nonetheless be surprised if whether \x${v}
or \c${v}
is an escape depends upon the value of v
(and surprised in the same was as \x41
vs. \x4x
and \cJ
vs. \c=
, plus one level of indirection).
The question is whether that surprise is a problem we should solve. I don't think it is. There is no reasonable expectation for what behavior you're going to get if you write \c${RegExp.escape(v)}
.
I think it's fine to say that the output of RegExp.escape
is not safe to use in a place where it would obviously mean something else, and that "immediately after \x
" and "immediately after \c
" are such places.
Sounds like we're in agreement here, so I'll close this, but will reopen if that's inaccurate.
Reopening pending confirmation from @gibson042 and @erights.
To clarify for those watching this thread. Template tags would enable safe context sensitive escaping. I originally objected to RegExp.escape
because it does not. I knew that there was an unsolvable even-vs-odd backslash problem, and thought there were others. After @bakkot clarified that the only such hazard was even-vs-odd backslash, I felt this was safe enough, because the exceptional unsafe case was narrow, easy to state, easy to remember, and easy to check by eyeball. To get there, as far as I am concerned, we gave up on the goal that the output of RegExp.escape
be readable.
Later on in the current tc39 meeting, we'll discuss Array.isTemplateObject
. Whatever my opinion on this specifically is (which is complicated), I strongly support the motivation for the proposal: To enable programmers (authors and reviewers) to reason clearly about the distinction between the literal parts that were authored as part of the program, vs attacker-controlled less-trusted data handled by that more-trusted program.
Therefore, I feel strongly that the first character needs to be adequately escaped to restore that simple safety property. My concern does not hinge on whether one would write \c${v}
on purpose. It might be written by accident, in which case the RegExp may be buggy, in the sense that it does not mean what its author thought it meant. But even that case should not compromise this safety property. Reviewers who don't otherwise care about what the RegExp means should still be able, by eyeball and without an unrealistic memory burden, to reason about the impact of attacker-controlled less-trusted data, simply.
As noted in https://github.com/tc39/proposal-regex-escaping/issues/58#issuecomment-1827185269 , an unescaped non-digit leading character could still be interpreted as part of an escape sequence spanning concatenated
RegExp.escape
output.