Trying to get in touch regarding a security issue

JamieSlome commented 3 years ago

Hey there!

I'd like to report a security issue but cannot find contact instructions on your repository.

If not a hassle, might you kindly add a SECURITY.md file with an email, or another contact method? GitHub recommends this best practice to ensure security issues are responsibly disclosed, and it would serve as a simple instruction for security researchers in the future.

Thank you for your consideration, and I look forward to hearing from you!

(cc @huntr-helper)

slevithan commented 3 years ago

Thanks for reaching out, @JamieSlome.

This is the first time getting a request like this, You can send an email to the author email in package.json: steves_list@hotmail.com.

slevithan commented 3 years ago

Turns out this report was invalid. The security researchers reported the issue behind a private link, which I'm happy to share.

I'm including my reply below for posterity.

This is invalid/false because XRegExp adds the /y (sticky) flag to all regex syntax tokens whenever /y is supported natively (ES6 browsers). This causes the issue you described to go away entirely, by only testing the first position in the subject string rather than every position. See: https://github.com/slevithan/xregexp/blob/d1302bd7b9b2eb94cc9fefd61d40ed244f5c04a4/src/xregexp.js#L730

Even for pre-ES6 browsers, XRegExp applies all regex tokens via XRegExp.exec with the 'sticky' option enabled, which triggers its own handling of sticky mode even when native flag /y is unavailable. See: https://github.com/slevithan/xregexp/blob/d1302bd7b9b2eb94cc9fefd61d40ed244f5c04a4/src/xregexp.js#L470

In other words, your example code here does not use the regex in question the way XRegExp does.

In the future, it would be helpful to actually use XRegExp to show a potential vulnerability in action, rather than writing code that makes no use of XRegExp.

Despite this being invalid, I feel compelled to describe why I do not consider this a vulnerability even if XRegExp didn't have multiple layers of protection against the problem. Here goes...

As far as I can tell, this is a misapplication of the term "ReDoS." This regex does not trigger catastrophic/runaway/exponential backtracking with your example input. And it cannot do so with any input, because there are no potentially overlapping partial matches or backtracking loops in the regex.

Yes, greedy matching to the end of the string with [^>]+ will cause the JavaScript regex engine to have to backtrack one character at a time if there is no > to end the match successfully, but the time needed for this scales linearly with the length of the string. By putting tens of thousands of partial matches in the subject string that the regex engine must test, it indeed takes a long time (but then, the closer each '\\k<' is to the end of the string, the less time it will take the regex engine to test that position).

This is the nature of regex engines that use backtracking algorithms when they are applied to very long strings that contain many unsuccessful partial matches that each have to be tested until the end of the string to rule out matches. Therefore, I would say that you are highlighting a performance issue with the nature of the backtracking algorithms built into JavaScript engines (which is necessary to support the features of JavaScript regexes). There is nothing poorly crafted about the specific regex you highlighted that leads to it potentially triggering an ReDoS. Each position that starts a partial match takes linear time to test; the time scales linearly with the number of characters following the position that is being tested. But because there are many partial matches in your incredibly long and specially crafted input, this takes a long time. So you're triggering an expensive operation, yes, but it's no more expensive than it needs to be if we truly needed to test every position in your malicious string with the specific rules prescribed in the regex (which we don't, which is why XRegExp is smart about this under the hood).

Also note that, because XRegExp already treats \k as an invalid token (and throws an error) when \k doesn't participate in a match of \k<...>, the search doesn't get repeated within the same pattern. It would throw after the first (long) match fails. See: https://github.com/slevithan/xregexp/blob/d1302bd7b9b2eb94cc9fefd61d40ed244f5c04a4/src/xregexp.js#L1731

Also also (and very importantly), note that the regex in question here is applied only to regular expression patterns that are parsed by XRegExp; not to the strings that are searched by the resulting, compiled regex. So for user input to trigger this, an app would need to be accepting unsafe strings to be used as regex patterns. If an app did that, then it would be easy to feed it a short regex pattern (no need for tens of thousands of chars) that would trigger an ReDoS on even short subject strings that the regex is subsequently applied to. In other words, XRegExp's regex parsing process would be the least of your concerns.

Given what I mentioned about how the regex you highlighted does not contain anything that could trigger runaway/non-linear backtracking, even if I wanted to avoid this contrived scenario of accepting arbitrary input as regex patterns (contrived for any app that is not specifically a regex tester that people can use to ReDoS themselves), and even if the way XRegExp actually works didn't invalidate this as a problem, there wouldn't be a solution that doesn't change the underlying behavior for the worse.

Using lazy/non-greedy matching via [^>]+? wouldn't change anything, because the regex engine would still need to backtrack forward one character at a time to the end of the string, rather than backtracking backward after jumping to the end.
Capping the length of names allowed in XRegExp's named groups would be imposing an arbitrary restriction since there is no corresponding limit on JavaScript variable names.
Changing [^>]+ to [^<>]+, [^\\<>]+, or similar would prevent throwing a relevant/helpful error when < or \ is used in (invalid) backreference names. So e.g., for \k<n<>, it would prevent throwing an error about the backreference name ("Backreference to undefined group \k<n<>"), and instead leave \k<n<> to be handled by other potential meanings in the underlying regex syntax. Fortunately, since as I mentioned earlier, the escape \k is already treated as an "Invalid escape" error if it doesn't take part in a match of \k<...>, there would still be a less useful error ("Invalid escape \k").

What you seem to be describing here is that even well-crafted regexes, if they use unlimited-length quantification (*, +, {n,}, etc.), can take a long time to run if given really damn long maliciously crafted inputs with tens of thousands of partial/incomplete matches. I don't see this as a security vulnerability.

All of that said, regex performance in general (including ReDoS attacks) is something I care a lot about, and I would treat it as a high priority to fix any regexes used by XRegExp that could actually trigger runaway/exponential backtracking.

slevithan / xregexp

Trying to get in touch regarding a security issue #335