Closed jwestbrook closed 3 years ago
This looks really subtle. What does the attack look like? Is ?:
a look-ahead/behind operator?
?:
defines the group as non capturing- meaning you can't re-use the group results in a replace method.
Using capturing groups (..)
or non-capturing groups (?:..)
does not change whether a regular expression is vulnerable to ReDoS.
The change you've made is effectively a noop, it doesn't change anything.
You can see an example input string that still causes bad performance here: https://regex101.com/r/tWa7BJ/1
The problem is the (?:"[^"]"|'[^']'|[^>])
group, both [^>]
and "[^"]"
match a pair of double-quotes.
This ambiguity in how a string can be matched is what can cause issues.
To fix it, the ambiguity must be removed.
The first step is to remove the overlap, so change [^>]
to [^>'"]
.
But then tags like <a ">
are no longer recognized.
So add that tags may contain an unclosed quotation.
The end results ends up being:
/<\w+(\s+("[^"]*"|'[^']*'|[^>'"])+)?\s*("[^">]*|'[^'>])?>|<\/\w+>/
I think the above matches the same strings as the original regexp, but without the performance problems. But I'm not completely sure (about it matching the same strings).
well crap - I pushed the wrong pattern, closing this PR and reopening as a new one
Per conversation, here is the Pull Request for the advisory.