rohaquinlop / automathon

A Python library for simulating and visualizing finite automata
https://rohaquinlop.github.io/automathon/
MIT License
61 stars 2 forks source link

Incorrect removal of epsilon transitions #2

Closed interrogator closed 2 years ago

interrogator commented 2 years ago

Hi,

Thanks for the library!

I generate an NFA with the following data and attempt to remove epsilon transitions.

The result is incorrect: in the produced DFA, it becomes possible to jump from A to D without passing through B and C.

image

Another issue I've found is that if there are two mirroring epsilon transitions, i.e. from 1 to 2 and from 2 to 1, attempting to remove epsilon transitions breaks with a max recursion error.

Any idea what's happening? If need be I could potentially help with a fix!

Thanks

rohaquinlop commented 2 years ago

Hi,

Foremost, I want to thank you for using automathon, I was really excited to know that you were using it. You are absolutely right, I am going to look into the errors you mention and also if you want to help with your solution, I am completely open to receive your help.

Thank you very much, greetings from Colombia

interrogator commented 2 years ago

Hey,

Thanks for the fast reply!

I've found a working solution at: https://github.com/lewiuberg/automata/blob/lambda/automata/fa/nfa.py#L154

But this means I currently have to use both libs (that one for the epsilon removal, yours for the visualisation), which isn't much good. And this fix is in a branch of a fork, which is not ideal.

I'm not much of a graph/automaton theorist, so I'm not sure I'll be able to code a solution. But if there is something I can do I'd be happy to try if I've got time.

Thanks again for your help. We're using this code to facilitate a kind of advanced way of searching natural language text, just in case you're curious!

rohaquinlop commented 2 years ago

Thank you for the solution!

I'll be working on it tomorrow and hope I can fix it as soon as possible.

I'm very happy that the library is helping you on the research!

rohaquinlop commented 2 years ago

Have a good day Daniel, hope you are going well.

I think I just fixed the errors, you can use the new version 0.0.8!

Thank you for report me the error, if there is anything that I can do for you just tell me or email me, all my contact information is on my profile.

interrogator commented 2 years ago

Hey, wow, that fix is great, thanks so much for helping so quickly.

There is one other thing I'm trying to accomplish, wondering if you can help.

For example, I generate this NFA with epsilon:

image

After running the removeEpsilonTransitions() method, I get:

image

What I would like to generate instead is the equivalent:

image

Would you consider coding an option for this, or providing me with some tips so that I can adjust your code and do it myself?

Thanks so much!

rohaquinlop commented 2 years ago

It is with pleasure for me!

What you propose is very interesting, I will work on what you say, what I can see is that I must make a minimization of the automaton.

interrogator commented 2 years ago

Yes, minimization, that is probably the right term for it. I think you understand what I mean! If you can find a solution, that would be fantastic. I suppose it could be accessed by automaton.removeEpsilonTransitions(minimal=bool) or even there could be a more general method, automaton.minimize() which removes duplicate routes where possible. Anything you like would be fine with us! And let me know if there's something I can do to help.

rohaquinlop commented 2 years ago

Yes! I was thinking on one general method. I'm looking how to make it

rohaquinlop commented 2 years ago

Hey @interrogator I think I found a way to minimize a NFA, only created this method for the NFA class because the theory says that is different when you want to reduce the number of states on DFA, for the moment the minimize function is only available for NFA. Here is a preview

Before

minimize - NFA without EpsilonTransitions gv

After minimize - result of minimization gv

Hope this is what you're looking, if you find any error or anything that I can make for you just let me know.

Have a good day!

interrogator commented 2 years ago

Looks great! I will try it out shortly!

interrogator commented 2 years ago

OK, I gave it a try and the new algorithm seems to work well!

I think I have just one more issue: When I look inside the .delta attribute, I see that the node names are stringified lists:

{"['0']": {'AA': ["['1']"]}, "['1']": {'BB': ["['10', '13', '2']"]}

Would it be possible for this data structure to be cleaned up, so that the keys are just 0 and the value's values are either a list or set of ints? Otherwise I have to eval the data, which is never very nice.

Thanks again for your help, it's really working great!

rohaquinlop commented 2 years ago

Great!!

Let me see if I understand what you say.

From this: {"['0']": {'AA': ["['1']"]}, "['1']": {'BB': ["['10', '13', '2']"]}

to one of these options:

Please tell me which one of the options.

It's a pleasure to help you.

interrogator commented 2 years ago

I think using sets might be best where possible:

{'0': {'AA': {'1'}}, '1': {'BB': {'10', '13', '2'}}

Of course I would be fine if integers could be used here, as long as nothing ever gets turned into a string, then I would pass in strings to begin with and end up with:

{0: {'AA': {1}}, 1: {'BB': {10, 13, 2}}

Make sense? Thanks again!

interrogator commented 2 years ago

By the way, something else I found useful in a different lib, is the ability to renumber an automaton starting from 0 after minimization. For example, with the pythomata.impl.simple.SimpleNFA class, you can do nfa.renumbering() and your nodes start from 0 again. So this would be really nice to have as well.

rohaquinlop commented 2 years ago

Hi @interrogator

Didn't reply because I was on an ICPC competition, so excuse me for my absence. I think that for the moment I can change on the algorithm to return the result on this way: {'0': {'AA': ['1']}, '1': {'BB': ['2']} while I rewrite the code to use sets instead of lists, and I was already thinking on that function, to "fix" the array results on the states, so I think that we are on the same page. Let me work on the renumbering function and then publish the update, and while you work with this update I'll be rewriting the code to use the sets instead of lists.

Hope this is what you're looking, if you find any error or anything that I can make for you, just let me know.

rohaquinlop commented 2 years ago

Hey @interrogator,

The renumber method is already implemented, this method doesn't return a NFA, simply changes the labels on the automata when you call the function: automata.renumber()

interrogator commented 2 years ago

Hey,

I just got an unexpected error on what should be a very simple graph. Here I catch the error and print out the automaton's __dict__ at error time for you

{'Q': {'3', '2', '1', '0'}, 'sigma': {"token.upos = 'A'", "token.upos = 'C'", "token.upos = 'B'"}, 'delta': defaultdict(<class 'dict'>, {'0': {"token.upos = 'A'": {'1'}}, '1': {"token.upos = 'B'": {'2'}}, '2': {"token.upos = 'C'": {'3'}}}), 'initialState': '0', 'F': {'3'}}
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_24489/1628032756.py in <module>
      2 
      3 query = '[upos="A"] [upos="B"] [upos="C"]'
----> 4 data = cquery(query=query, retrieve=True, epsilon=False, minimise=True)
      5 data

~/work/cquery/cquery/cquery.py in cquery(db, corpus, id_field, query, prefilter, graph, automaton, epsilon, script, as_dict, retrieve, minimise, verbose)
    581     if minimise:
    582         try:
--> 583             auto = auto.minimize()
    584         except:
    585             print(auto.__dict__)

~/venv/py3.9/lib/python3.9/site-packages/automathon/finiteAutomata/nfa.py in minimize(self)
    289   def minimize(self):
    290     """Minimize the automata and return the NFA result of the minimization"""
--> 291     localDFA = self.getDFA()
    292     localNFA = localDFA.getNFA()
    293     localNFA.renumber()

~/venv/py3.9/lib/python3.9/site-packages/automathon/finiteAutomata/nfa.py in getDFA(self)
    260 
    261       for t in T:
--> 262         T[t].sort()
    263         tmp = T[t].copy()
    264         if tmp not in visited:

AttributeError: 'set' object has no attribute 'sort'
rohaquinlop commented 2 years ago

This error is because you're using sets instead of lists on the delta variable, you have defined delta like this {'0': {"token.upos = 'A'": {'1'}}, '1': {"token.upos = 'B'": {'2'}}, '2': {"token.upos = 'C'": {'3'}}} instead of this {'0': {"token.upos = 'A'": ['1']}, '1': {"token.upos = 'B'": ['2']}, '2': {"token.upos = 'C'": ['3']}}. Actually, I'm rewriting the code to use sets instead of lists, but for the moment the update that I made yesterday was to add the renumber function.

interrogator commented 2 years ago

Ah ok, thanks for that!

rohaquinlop commented 2 years ago

Hi @interrogator,

No problem, I have changed the use of lists on the delta variable on the NFA's, so you can use it right now like you were trying {'0': {"token.upos = 'A'": {'1'}}, '1': {"token.upos = 'B'": {'2'}}, '2': {"token.upos = 'C'": {'3'}}}.

Have a good day!

interrogator commented 2 years ago

I guess I can close this issue, since all the mentioned problems are solved! Are there any other updates planned?