vnmakarov / yaep

Yet Another Earley Parser
Other
133 stars 13 forks source link

Joop Leo's optimization? #24

Open orlp opened 5 years ago

orlp commented 5 years ago

It is stated that YAEP doesn't use Joop Leo's optimization, and that right recursion is fast anyways. And because of the lookahead I do believe this to be true for some grammars, but not all LR(k) grammars. For example, the following LR(2) grammar:

S -> A 'a' 'b'
A -> 'a' A
A ->

Generates increasingly large set cores when parsing "aaa...aaab", leading to (at least) quadratic behavior.

Perhaps an optimization similar to Joop Leo's could be made, making YAEP linear time for all LR(k) grammars?

vnmakarov commented 5 years ago

Thank you for your feedback.

Yes, in brief, I consider implementation of Joop Leo's optimization or something similar.

In fact, I believe Joop Leo's work is more important that one described in Practical Earley Parser article. Saying that I don't know when I actually will start to work on it and when it could be ready. It is not a high priority project for me.

I tried to implement an Earley Parser which could be useful for programming languages implementation. Therefore I spend more efforts on implementing simple direct translation and a good error recovery. I also don't believe that LR(k) parsers (where k > 1) are useful for such tasks. Usually, if a programming language grammar is not LR(1), it is also not LR(k) k > 2 (e.g. C or C++).

But still it would be nice to parse LR(k > 1)-language by Earley parser in linear time.

Although I wanted to implement my Earley parser as a swiss knife for parser implementations, I found that its usage requires a lot of expertise for non-trivial parsers. It is easy to write an ambiguous grammar (e.g. during modification of an original grammar) when you have not intention to do this.

By the way, I have also yacc-like parser MSTA (https://github.com/dino-lang/dino/tree/master/MSTA) which can generates effective LR(k), k > 1 parsers. This compiler-compiler does a lot optimizations, like extracting LALR and regular parts of grammar and implements parsing of them more effectively. Deterministic parser generators are more safe for the problem I mentioned above. So I'd prefer to use MSTA than my Earley parser for a typical programming language implementation.

But even more I prefer to write manual parsers with backtracking if necessary because I like to use original grammars from the programming language definition. Other approaches require a lot of grammar modifications (e.g. original Javascript grammar is actually a grammar generator or can be considered as context sensitive one).