vnmakarov / yaep

Yet Another Earley Parser
Other
135 stars 13 forks source link

fail fast when ambiguity is detected? #25

Open andres-erbsen opened 5 years ago

andres-erbsen commented 5 years ago

Hi,

I am considering using yaep to add limited user-customizable syntax to a domain-specific language. In https://github.com/vnmakarov/yaep/issues/24#issuecomment-449791945, you express concern that a non-trivial expertise is required to avoid unintended ambguities in grammars. If I understand correctly, this is because Earley parsing is much slower on ambiguous grammars. In that case, would it be feasible to have yaep abort parsing once ambiguity is detected, ideally with an informative error message?

Thanks, Andres

vnmakarov commented 5 years ago

Thank you for the proposal. Unfortunately, in general case the grammar ambiguity can be detected only on specific input after finishing building parse sets (meaning consuming all input) and starting building a parse tree.

Actually parse_tree function already returns a flag if the ambiguity is found.

As I remember recognizing ambiguity by grammar only is undecidable problem. But it is possible to recognize that grammar is out of narrow unambiguous grammar class (e.g. LR(k)). So I don't see a good solution right now for the ambiguity recognition problem although I'll think about some solution. May be I'll find an useful approach.

andres-erbsen commented 5 years ago

I agree that recognizing a grammar as ambiguous is undecidable, I meant to ask about recognizing an input as ambiguous. Here is another way of asking mostly the same question: how much work does yaep need to do before it can detect that an input is ambiguous? Is it O(n^3) or O(n^2)? is it fast in common cases?

andres-erbsen commented 5 years ago

Thinking about it some more: it should be possible to detect that an input is ambiguous in O(n^2) time. This is because parsing an ambiguous input will finish in O(n^2) time, and if we knew the big-constant, we could just have the parser abort after k*n^2 time and declare the input ambiguous. This, of course, would be dissatisfying strategy for implementation. Once I acquire some of that copious free time I have been dreaming about, I will look into this more to see whether a more practical approach comes up.