neogeny / TatSu

竜 TatSu generates Python parsers from grammars in a variation of EBNF
https://tatsu.readthedocs.io/
Other
408 stars 48 forks source link

Indirect Left Recursion #69

Closed Victorious3 closed 5 years ago

Victorious3 commented 6 years ago

Taken from the calculator example, simplified:

start = expression $;
number = /\d+/;

addition = expression '+' number;
subtraction = expression '-' number;

expression =
    | addition
    | subtraction
    | number;

This fails on the input 1-1+1 with

(1:4) infinite left recursion :
1-1+1
   ^
expression
subtraction
expression
start

and on 1+1-1 with

(1:4) Expecting end of text :
1+1-1
   ^
start

There is a workaround for this, changing the grammar as follows:

 expression =
+    | >addition
+    | >subtraction
-    | addition
-    | subtraction
     | number;

or like this:

-addition = expression '+' number;
+addition = expression ('+' | '-' ) number;
-subtraction = expression '-' number;

 expression =
     | addition
-    | subtraction
     | number;

yields the correct result with both inputs.

Sadly both workarounds make walking the AST more difficult.

Trace for 1+1-1:

↙start ~1:1
1+1-1
↙expression↙start ~1:1
1+1-1
↙addition↙expression↙start ~1:1
1+1-1
↙expression↙addition↙expression↙start ~1:1
1+1-1
⟲ expression↙addition↙expression↙start ~1:1
1+1-1
⟲ addition↙expression↙start ~1:1
1+1-1
↙subtraction↙expression↙start ~1:1
1+1-1
↙expression↙subtraction↙expression↙start ~1:1
1+1-1
⟲ expression↙subtraction↙expression↙start ~1:1
1+1-1
⟲ subtraction↙expression↙start ~1:1
1+1-1
↙number↙expression↙start ~1:1
1+1-1
≡'1' /\d+/
+1-1
≡number↙expression↙start ~1:2
+1-1
↙addition↙expression↙start ~1:2
+1-1
↙expression↙addition↙expression↙start ~1:2
+1-1
≡expression↙addition↙expression↙start ~1:2
+1-1
≡'+' 
1-1
↙number↙addition↙expression↙start ~1:3
1-1
≡'1' /\d+/
-1
≡number↙addition↙expression↙start ~1:4
-1
↙expression↙addition↙expression↙start ~1:4
-1
↙addition↙expression↙addition↙expression↙start ~1:4
-1
≡addition↙expression↙addition↙expression↙start ~1:4
-1
≡expression↙addition↙expression↙start ~1:4
-1
≢'+' 
-1
≡addition↙expression↙start ~1:4    # Why does this match even tho '+' didn't on the line above?
-1                                 # Now it should try the subtraction, but doesn't
≡expression↙start ~1:4
-1
≢start ~1:1
1+1-1
Victorious3 commented 6 years ago

I tried the grammar with https://github.com/norswap/whimsy which uses the same algorithm for left recursion (http://norswap.com/pubs/sle2015.pdf)

It parsed both inputs correctly: https://gist.github.com/Victorious3/13628fe71cefd6321952bf9c53bab2cd

So the the algorithm does indeed support indirect left recursion (at least in this case). However, it is a lot more explicit about this than Tatsu. I suspect that the issue lies somewhere in detecting left recursive rules, provided that the implementation of said algorithm is correct otherwise. Tagging @norswap any help would be appreciated

Out of curiosity, what was the problem with Warth's approach? It involved figuring out left recursive rules on the fly, which fits well with Tatsu. It currently adds left recursive rules to a set once it finds them, but never removes them. @apalala

I might try moving over the more complicated calculator example over next.

norswap commented 6 years ago

I'm not too sure how I can help, but do tell if you have an idea. There indeed shouldn't be a problem with this use of recursion.

What does the > (unless it is |> ?) notation means? Since that seems to fix it in TatSu.

Autumn doesn't automatically find left-recursion, although I did it in an old version. What is needed is for each parser to be able to point out to the subset of his sub-parsers that can be invoked at the same position the parser is invoked at. From there it's just a matter of finding cycles in a directed graph.

This is a bit awkward if you want to be able to specify custom parsers in a lightweight way like I did in this version of Autumn. The next version of Autumn will reconsider this decision - you'll be forced to define a new class for each custom parser anyhow, and there will be a way to build and extend visitors. Then you'd be able to opt-in your custom parsers into left-recursion detection.

Victorious3 commented 6 years ago

> inlines the rule, effectively removing the indirect left recursion. (But also ridding me of any way to add semantic actions to addition and subtraction so it isn't a solution)

I think what it does to detect left recursion is too simple to catch all cases. Perhaps it does the left recursion too often or for the wrong rules. @norswap can Autumn create similar traces or do I have to do that myself?

Tatsu can only detect cycles as they happen (that's what I suspect _set_left_recursion_guard does). The left recursion should only happen when expression is invoked inside addition or subtraction (call to self()), correct?

It could also be an issue with memoization, returning a false positive, remembering a result saved by the "wrong" invokation of the rule.

It's still a lot of guesses at this point... All the relevant code is in contexts.py if you want to have a look at it as well. I'm on vacation this week.

Related: https://github.com/PhilippeSigaud/Pegged/wiki/Left-Recursion

norswap commented 6 years ago

I actually had to check. Apparently not, though an older version definitely did. These traces are often unreadable in my experience (I think the one you posted above exemplifies that) and so I prefer a lighter touch when debugging.

It's going to be hard to easily get a trace in the current version. As far as I see, you have two options:

  1. Wrap every parser in a parser that performs the required trace update.
  2. Create but don't throw an exception and print it when adequate, you should be able to get most parsers from the function names in the trace.

Incidentally, next version will have optional trace building (I coded that up early this week).

Victorious3 commented 6 years ago

I've decided to take the implementation in Pegged as a reference now. This does involve detecting left recursive rules ahead of time and detecting left recursive cycles to decide whether or not to turn memoization off depending on that. The code for it is... quite involved, to say the least.

https://github.com/PhilippeSigaud/Pegged/blob/0901d95353ad0d135cab68a77657f0fa3c09c785/pegged/grammar.d#L713 (Tip of the iceberg...)

This further strengthens my belief that there are essentially three things wrong with Tatsu's simpler approach:

I'm also going to have a look at the above mentioned old version of Autumn to see if I get any pointers from there.

As the effort going into fixing it this way would be quite substantial, I'd like @apalala to have a look at my findings before I make a PR to track my progress. Maybe there's a simpler way? I'm curious about what you tried to fix left recursion in #32 and what made you revert it, ultimately

Pegged's approach is outlined here: https://github.com/PhilippeSigaud/Pegged/wiki/Left-Recursion

norswap commented 6 years ago

The key part of the old Autumn code is here: https://github.com/ncellar/autumn_v1/blob/master/src/com/norswap/autumn/extensions/leftrec/LeftRecursionVisitor.java

Victorious3 commented 6 years ago

@norswap Thanks, this looks a lot more straight forward! (And well documented!) I'm curious about

Ordinary memoization of left-recursive rules is temporarily blocked, so they don't interfere with the recursive progression.

Did you run into any issues with that?

norswap commented 6 years ago

In Autumn, there is no memoization -- v1 had explicit memoization though (as just another combinator) and this could also be implemented in subsequent versions (and maybe has been - I forgot!).

But assuming general memoization, this would indeed be an issue, the idea being that you want to invoke the same parsing expression at the same position multiple times and have it get different (larger) results. I don't think I actually implemented this, but it should be straightforward using a flag.

I also realized I haven't actually implemented left-recursion support in v4 yet. I will do it because I need it to illustrate parts of my thesis - but in my experience it's mostly useful for expressions, in which specialized constructs can actually fare much better. I have a LeftAssoc combinator in v4 that has left, operator and right children and basically matches left (operator right)+ but calls a user-supplied function each time a right-hand-side is matched, letting you perform left-associative tree building.

I mention this because actually rewriting a left-associative grammar to use such tree-rewriting constructs is also another implementation possibility. It's rather tedious when considering optional indirect left-recursion however...

Victorious3 commented 6 years ago

@norswap Another thing. I suppose the nullability checks are required to detect hidden left recursion?

norswap commented 6 years ago

Indeed. You can derive a nullability checker from this code: https://github.com/ncellar/autumn_v1/blob/master/src/com/norswap/autumn/graph/NullabilityCalculator.java