tjenkinson / redos-detector

A CLI and library which tests with certainty if a regex pattern is safe from ReDoS attacks. Supported in the browser, Node and Deno.
https://redosdetector.com
MIT License
41 stars 4 forks source link

#618 regressions and more explicit methodology #624

Open nisbet-hubbard opened 3 days ago

nisbet-hubbard commented 3 days ago

This is a most useful project, especially given it can be easily incorporated into any CI workflow. So, please keep up the good work!

Until recently, our experience is its results are broadly in line with those of Nicolaas Weideman‘s RegexStaticAnalysis. #618, however, has caused significant (non-performance) regression and we’ve had to pin an earlier commit.

To use @gms1’s example from #621, a regex as simple as \s+ is currently considered unsafe because it’s interpreted as ^.*?\s+. But not having ^ is hardly equivalent to having a lazy .* in front of the expression. Lazy Kleene just backtracks in the other direction than the greedy Kleen, whereas a bare \s+ never has to backtrack:

  1. pattern = "\s+" NFA constructed in: 1ms EDA analysis performed in: 1ms Does not contain EDA IDA analysis performed in: 1ms Does not contain IDA Total analysis time: 3

  2. pattern = "^.*?\s+" NFA constructed in: 1ms EDA analysis performed in: 1ms Does not contain EDA IDA analysis performed in: 72ms Contains IDA, degree 1, with: a\x09\x09\x09...\x09a IDA exploit string as JSON: {"degree":1,"separators":["a\t\t"],"pumps":["\t"],"suffix":"a"} Prefix: "a\x09\x09" Pump 0: "\x09" Suffix: "a" Total analysis time: 74

The reason (a+)+ is exponential (#606) isn’t because it has the same semantics as ^.*(a+)+, but simply that it’s a nested Kleene expression.

As a side note, at some point it would probably be good to start documenting the methodology of this project more systematically. There’s unfortunately not much consensus on the exact scope of bad regexes (see eg this recent literature review), so it’s particularly helpful to at least be explicit about one’s methodology.

tjenkinson commented 2 days ago

Hey thanks for the info. I’ve got some ideas and looking into this. Will update when I have more of a plan

I got a bit confused with (a+)+. It’s actually fine in that form given it’s unbounded. (a+)+$ is when there’s an issue. I think https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS is incorrect here and they’re missing the end anchor in some of the examples.