hokein / Automata.js

A regular expression converter
http://hokein.github.io/Automata.js/
226 stars 23 forks source link

Major issue when dealing with many terminal states #9

Closed ClementNerma closed 6 years ago

ClementNerma commented 6 years ago

While using your program, I tried for fun to write the following expression: a|b|c|...|z. I got an unexpected result (a very strange automata), so to debug it a little I ran it step by step.

So, when I use "Compile to DFA" with a|b|c|d|e|f, everything is fine:

Working example

But if I simply add a g option (a|b|c|d|e|f|g), a strange result happens:

Unexpected result

A crazy result is when I write a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z|0|1|2|3|4|5. Here's an extract of the drawn automata:

Crazy result

This is really strange, any idea of what happens?

hokein commented 6 years ago

Looks like a bug when converting NFA to DFA.

G-Ricky commented 6 years ago

Hello, I think I found it. It was in function _reorderNFAStateId

Automata.js/src/regparser.js:283 RegParser.prototype._reorderNFAStateId = function() {

When you dynamically change state.id, it may miss some nodes to change id or repeatedly change the same node.It is likely to be in the case of multiple branches converging in NFA.

Result of NFA of A|B|C|D|E|F|G

With _reorderNFAStateId()

Without _reorderNFAStateId()

It seems that node 10 is the missing one

Suggestion: Optimize algorithm of ordering Or separate state id and node name

hokein commented 6 years ago

@G-Ricky Thanks for your investigation, it is really helpful! It seems like a bug in _reorderNFAStateId, since you have figured out the root cause, do you mind sending a PR fixing it?