osa1 / parsegen

An LR parser generator, implemented as a proc macro
MIT License
15 stars 0 forks source link

A source of inefficiency in LR(1) parsers. #11

Open modulovalue opened 1 year ago

modulovalue commented 1 year ago

Hello @osa1!

There's one observation that I made about a source of inefficiencies in traditional LR(1) parsers here.

I'd love to hear your thoughts on it. I hope to find an algorithm that allows me to split states that would not introduce this redundancy, that is, one that doesn't need a post processing step and only splits states that are necessary.

(below is a copy of the linked comment:)


This grammar demonstrates a source of inefficiency in the automaton of the standard LR(1) construction.

Looking at the LR(1) automaton, I think that state 13 & 18 and state 17 & 12 could be merged and the resulting automaton would still resolve the ambiguity that LR(1) was meant to resolve. only 6 & 9 and 14 & 19 are required to be distinct new states.

This grammar is given as an example in some slides that introduce langcc. However, I'm not sure if langcc eliminates this source of inefficiency.

I don't know what this observation is called in the literature, but I would be interested in finding that out.

osa1 commented 11 months ago

I'm reopening this as I'm planning to do some parsegen dev. this weekend and I'd like to at least add the grammars in the links as tests.

Re: the inefficiencies, I'll take a look at the grammar but I have to page-in the algorithms again first :-)