Closed skvadrik closed 8 years ago
Assuming the same example:
/*!re2c
[ac]{0,14} [a] [ac]{0,14} {}
*/
Commit https://github.com/skvadrik/re2c/commit/b98997b6fed91b6c37b1517a1aae096fb8c770fc reduces the time of DFA construction from ~115s to ~1.2s:
$ time ./re2c slow.re > slow.c && stat -c '%s' slow.c
real 0m1.139s
user 0m1.073s
sys 0m0.065s
5646703
The generated DFA is still too big and should be minimized.
DFA size is reduced by commit https://github.com/skvadrik/re2c/commit/1136f2fb007461fc0c720fb6a1a9f58efed691e1 that adds DFA minimization.
$ time ./re2c slow.re > slow.c && stat -c '%s' slow.c
real 0m0.732s
user 0m0.684s
sys 0m0.048s
15078
The intermediate DFA (the one that is constructed by NFA determinization and further reduced by DFA minimization) is still large. We'll need incremental minimization to fully solve pathological cases like this. Closing the bug until someone encounters a real-world example.
Consider the following code (slow.re):
It takes re2c a long time to compile this code and the generated lexer is very large:
Increasing repetition counter (any of the two, or both):
Causes insane increase of time and size:
If we minimize the generated DFA, it would get reasonably small. (I experimented with minimization a bit; the number of states is reduced by several orders of magnitude and the minimization itself takes reasonable time, which suggests that the slowdown might be caused by inefficiency in FSA determinisation).