Open nisbet-hubbard opened 3 days ago
Hey thanks for the info. I’ve got some ideas and looking into this. Will update when I have more of a plan
I got a bit confused with (a+)+
. It’s actually fine in that form given it’s unbounded. (a+)+$
is when there’s an issue. I think https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS is incorrect here and they’re missing the end anchor in some of the examples.
This is a most useful project, especially given it can be easily incorporated into any CI workflow. So, please keep up the good work!
Until recently, our experience is its results are broadly in line with those of Nicolaas Weideman‘s RegexStaticAnalysis. #618, however, has caused significant (non-performance) regression and we’ve had to pin an earlier commit.
To use @gms1’s example from #621, a regex as simple as
\s+
is currently considered unsafe because it’s interpreted as^.*?\s+
. But not having^
is hardly equivalent to having a lazy.*
in front of the expression. Lazy Kleene just backtracks in the other direction than the greedy Kleen, whereas a bare\s+
never has to backtrack:The reason
(a+)+
is exponential (#606) isn’t because it has the same semantics as^.*(a+)+
, but simply that it’s a nested Kleene expression.As a side note, at some point it would probably be good to start documenting the methodology of this project more systematically. There’s unfortunately not much consensus on the exact scope of bad regexes (see eg this recent literature review), so it’s particularly helpful to at least be explicit about one’s methodology.