iter-tools / regex

A streaming regex evaluation engine
MIT License
11 stars 1 forks source link

Failing to detect all duplicates #22

Closed conartist6 closed 3 years ago

conartist6 commented 3 years ago

There are currently runaway complexity problems due to failure to detect duplicate repetitions being evaluated on the same character. In such a case the worse repetition should always be discarded because it will never be part of the final result.

This manifests as problems evaluating \w+. At position 0 in the input there is 1 way to reach the + operator. At position one, [unmatched]* creates a fork, and now there to be two ways to reach the + repetition on position 1. Each of these repetitions spawns additional states of its own, for n^2 complexity on the length of the word.