guillaumepotier / Parsley.js

Validate your forms, frontend, without writing a single line of javascript
http://parsleyjs.org
MIT License
9.05k stars 1.32k forks source link

Inefficient regular expression #1374

Closed freonirons409 closed 5 days ago

freonirons409 commented 1 week ago

What kind of issue is this? (put 'x' between the square brackets)

This part of the regular expression may cause exponential backtracking on strings starting with '0.' and containing many repetitions of '00.'.

Screenshot 2024-04-18 2 43 31 PM


Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

The regular expression engines provided by many popular JavaScript platforms use backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.

Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.

Recommendation

Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter.

Example

Consider this regular expression:

/^_(__|.)+_$/

Its sub-expression "(|.)+?" can match the string "" either by the first alternative "__" to the left of the "|" operator, or by two repetitions of the second alternative "." to the right. Thus, a string consisting of an odd number of underscores followed by some other character will cause the regular expression engine to run for an exponential amount of time before rejecting the input.

This problem can be avoided by rewriting the regular expression to remove the ambiguity between the two branches of the alternative inside the repetition:

/^_(__|[^_])+_$/

References

OWASP: Regular expression Denial of Service - ReDoS. Wikipedia: ReDoS. Wikipedia: Time complexity. James Kirrage, Asiri Rathnayake, Hayo Thielecke: Static Analysis for Regular Expression Denial-of-Service Attack. Common Weakness Enumeration: CWE-1333. Common Weakness Enumeration: CWE-730. Common Weakness Enumeration: CWE-400.

marcandre commented 1 week ago

Thanks for opening the issue.

I consider this a negligible issue, since the code runs on the client's machine.

That said, if you can propose a PR with an improved regex I'd welcome it.

freonirons409 commented 5 days ago

@marcandre our security team cleared this issue and negligible and can safely be dismissed so I will close out this issue. Seemed like a non-issue to me as well, but wanted to double check. Closing this out!