zevv / npeg

PEGs for Nim, another take
MIT License
331 stars 22 forks source link

Implement handling of left recursion #47

Open zevv opened 1 year ago

zevv commented 1 year ago

Handling left recursion is problematic in PEGs, see https://en.wikipedia.org/wiki/Parsing_expression_grammar#Indirect_left_recursion for a nice explanation of the problem.

zevv commented 1 year ago

People smarter than me (@sqmedeiros) have thought hard about this problem, and there seems to be a viable solution for this, described in this paper: https://github.com/zevv/npeg/blob/master/doc/papers/Left_recursion_in_parsing_expression_grammars.pdf

I've been trying to get a good grasp of this paper, but the formal description seems to be quite over my head and I'm not sure how this model would fit into NPeg.

sqmedeiros commented 1 year ago

Hi, @zevv . Thanks for mentioning my paper.

I will try to explain the idea without the formalism. Essentially, you should have a map where the keys have the form (A, i), where A is left-recursive variable and i is a position of the input. The initial value associated with this key should be a value as fail, indicating that it's not possible to match A at position i. The value stored in this map will be the result of matching A at input position i.

When we try to match the right-hand side of A, the matching of the left-recursive invocation of A will fail, but the matching of the right-hand side o A can still succeed, usually using a non-left-recursive alternative. See the example below: A -> A a / b

Input: "bac" Position: 123

When matching the first letter of the input, the alternative A a will fail, as the result of matching A at position 1 is fail. Fortunately, the second alternative succeeds. In this case, we need to try to increment the input prefix matched by A. Thus, we should update our map to something as mymap[(A, 1)] = 2 and try to match the right-hand side of A again at input position 1. In case it is possible to match a larger input prefix, we update our map again and try another matching of the right-hand side of A at input position 1, otherwise we stop the matching and give 2 as the result of matching A at position 1. In this example,

I hope this helps. Feel free to ask anything else.

By the way, this extended version https://arxiv.org/pdf/1207.0443.pdf also discusses (see Section 4) the problem of operator precedence.

zevv commented 1 year ago

Hi @sqmedeiros, thanks for taking the time to respond, much appreciated.

After chewing some more time on the paper, rereading a few times and and spending the evening at the kitchen table with pencil and notepad I was able to get a better grasp of the process; my current understanding more or less matches your above description, so it seems I'm on the right track! I will let this sink in for a bit and see if I can prototype an implementation one of these days.

Thanks for the updated paper, I'll include this one as well. Operator precedence is - I think - mostly a solved issue in NPeg; the NPeg language has been extended with some constructs to indicate precedence for specific rules, which NPeg then uses for precedence-climbing in the parser. (https://github.com/zevv/npeg#precedence-operators)

zevv commented 1 year ago

Ah, what a shame I did not find the updated paper before: I think chapter 5 of the updated version (A Parsing Machine for Left-recursive PEGs) is going to help me out a lot.

Thanks again!

sqmedeiros commented 1 year ago

Yes, chapter 5 is another way to describe how left-recursion could work in PEGs. You could use a similar idea without having to follow an approach based on a parsing machine. By the way, when implementing left-recursion, I think it is important to distinguish left-recursive rules from non-left recursive ones, so we only try to increase the matching of left-recursive rules.