prototypejs / prototype

Prototype JavaScript framework
http://prototypejs.org/
Other
3.53k stars 639 forks source link

Update regex for striptags method to prevent regex dos #348

Closed jwestbrook closed 3 years ago

jwestbrook commented 3 years ago

Per conversation, here is the Pull Request for the advisory.

walterdavis commented 3 years ago

This looks really subtle. What does the attack look like? Is ?: a look-ahead/behind operator?

jwestbrook commented 3 years ago

?: defines the group as non capturing- meaning you can't re-use the group results in a replace method.

erik-krogh commented 3 years ago

Using capturing groups (..) or non-capturing groups (?:..) does not change whether a regular expression is vulnerable to ReDoS.
The change you've made is effectively a noop, it doesn't change anything.

You can see an example input string that still causes bad performance here: https://regex101.com/r/tWa7BJ/1


The problem is the (?:"[^"]"|'[^']'|[^>]) group, both [^>] and "[^"]" match a pair of double-quotes.
This ambiguity in how a string can be matched is what can cause issues.

To fix it, the ambiguity must be removed. The first step is to remove the overlap, so change [^>] to [^>'"].

But then tags like <a "> are no longer recognized. So add that tags may contain an unclosed quotation.

The end results ends up being:

/<\w+(\s+("[^"]*"|'[^']*'|[^>'"])+)?\s*("[^">]*|'[^'>])?>|<\/\w+>/

I think the above matches the same strings as the original regexp, but without the performance problems. But I'm not completely sure (about it matching the same strings).

jwestbrook commented 3 years ago

well crap - I pushed the wrong pattern, closing this PR and reopening as a new one