izuzak / noam

JavaScript library for working with automata and grammars for regular and context-free languages
http://ivanzuzak.info/noam/
Other
201 stars 32 forks source link

Regex gym and fsm simulator gets stuck #15

Open Mahesha999 opened 6 years ago

Mahesha999 commented 6 years ago

I was trying to find the dfa accepting strings containing at least three occurrences of three consecutive 1's on alphabet Σ={0,1} with overlapping permitted.

I come up with following long regex:

(
(0+1)*111(0+1)*111(0+1)*111(0+1)*
+(0+1)*111(0+1)*1111(0+1)*
+(0+1)*1111(0+1)*111(0+1)*
+(0+1)*11111(0+1)*
)

First line for strings containing non overlapping three occurrences of three consecutive 1's Second line for strings containing first two occurrences overlapping (that is four consecutive 1's) Third line for strings containing last two occurrences overlapping (that is four consecutive 1's) Last line for all occurrences overlapping (that is five consecutive 1's)

I fed above regex to FSM simulator and it kept on processing. I can see the CPU utilization in both chrome and windows task managers. So I tried to feed the regex to regex gym. It also showed same behaviour.

Interestingly, when I omit last line (for all overlapping occurrences) in the regex, it returned proper FSM:

(
(0+1)*111(0+1)*111(0+1)*111(0+1)*
+(0+1)*111(0+1)*1111(0+1)*
+(0+1)*1111(0+1)*111(0+1)*
)

So whats going on here? Is it that the original regex is excessively complex?

izuzak commented 6 years ago

Hi @Mahesha999 👋

I fed above regex to FSM simulator and it kept on processing. I can see the CPU utilization in both chrome and windows task managers.

Just to make sure I understand -- did you try to generate a DFA or an eNFA? I see the "blocking" processing when I try to generate a DFA, but an eNFA is generated just fine.

So I tried to feed the regex to regex gym. It also showed same behaviour.

Strange -- the regex gym starts reducing the regex just fine for me.

Interestingly, when I omit last line (for all overlapping occurrences) in the regex, it returned proper FSM:

Yeah, strange again -- I see no difference in the behavior I described above when I remove the last part.

So whats going on here? Is it that the original regex is excessively complex?

Yeah, it's likely that converting the eNFA into a DFA and then minimizing the DFA is expensive. It's possible the might be some performance optimizations that can be made to the library, but I don't have time to look into that right now.