lark-parser / lark

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
MIT License
4.63k stars 397 forks source link

Some grammars can throw the CYK parser into an infinite loop #115

Open erezsh opened 6 years ago

erezsh commented 6 years ago

Not likely to happen naturally, but possible.

For example:

parser = lark.Lark("start: a\na: b\n b:c\n c:a\n | /a/")
parser.parse('a')

Throws all parsers into an infinite loop (the cyk parser already enters an infinite loop on the first line)

Possible solution: Detect infinite loops in grammar, and throw an exception.

SarthakDandriyal commented 5 years ago

Hey I would like to work on this issue.

erezsh commented 5 years ago

Great, be my guest.

Earley doesn't enter a loop anymore in recent versions.

The only thing left to fix is the cyk parser.

SarthakDandriyal commented 5 years ago

I would like to work on that can you guide me more about the cyk parser .

erezsh commented 5 years ago

What kind of guidance do you need?

The code of the cyk parser is here: https://github.com/lark-parser/lark/blob/master/lark/parsers/cyk.py

You should figure out which line is causing the infinite loop, and take it from there.

SarthakDandriyal commented 5 years ago

Thank you I will work on it.

SarthakDandriyal commented 5 years ago

hey @erezsh can you give me some test cases for the cyk parser.

erezsh commented 5 years ago

What do you mean?

Did you see this page? https://lark-parser.readthedocs.io/en/latest/how_to_develop/