A source of inefficiency in LR(1) parsers.

Hello @osa1!

There's one observation that I made about a source of inefficiencies in traditional LR(1) parsers here.

I'd love to hear your thoughts on it. I hope to find an algorithm that allows me to split states that would not introduce this redundancy, that is, one that doesn't need a post processing step and only splits states that are necessary.

(below is a copy of the linked comment:)

This grammar demonstrates a source of inefficiency in the automaton of the standard LR(1) construction.

Looking at the LR(1) automaton, I think that state 13 & 18 and state 17 & 12 could be merged and the resulting automaton would still resolve the ambiguity that LR(1) was meant to resolve. only 6 & 9 and 14 & 19 are required to be distinct new states.

This grammar is given as an example in some slides that introduce langcc. However, I'm not sure if langcc eliminates this source of inefficiency.

I don't know what this observation is called in the literature, but I would be interested in finding that out.

osa1 / parsegen

A source of inefficiency in LR(1) parsers. #11