Closed SupraSummus closed 5 years ago
I managed to further reduce the grammar:
import lark
grammar = """
a: "a"
as: a*
start: as a "f"
"""
input_string = 'af'
# this is ok
lark.Lark(grammar, parser='earley').parse(input_string)
# this fails to parse
lark.Lark(grammar, parser='lalr').parse(input_string)
bottom line raises
lark.exceptions.UnexpectedToken: Unexpected token Token(F, 'f') at line 1, column 2.
Expected one of:
* A
however this grammar behaves as expected:
lark.Lark('start: "a"* "a" "f"', parser='lalr').parse('af')
also different exception is raised in this script (difference is in as
rule):
import lark
grammar = """
a: "a"
as: "a"*
start: as a "f"
"""
input_string = 'af'
lark.Lark(grammar, parser='lalr').parse(input_string)
raises
lark.exceptions.UnexpectedCharacters: No terminal defined for 'f' at line 1 col 2
af
^
As you already figured out, it's probably due to Shift/Reduce collisions being automatically resolved as Shift.
I just added a way to see the current collisions (issue #258 )
If you need a way to control the resolution (i.e. to force a reduce) I'll consider adding it as a feature.
I see your point about warning that a grammar isn't "LALR compatible", but at the same time, I don't want to harass the users, who 99% of the time just want to resolve to shift (It's a made up figure, but this is the first time this issue has come up in months, and PLY uses the exact same trick).
Hi, thank you for response. (You are awesome with all of this patience responding to users questions btw ;-) )
Really I've got little idea of how LALR works, so I've checked my grammar against other LALR generator and there wasn't any conflicts.
generator: http://jsmachines.sourceforge.net/machines/lalr1.html input:
S -> AS A f
AS -> ''
AS -> AS A
A -> a
However this grammar has two conflicts (difference in second AS
production):
S -> AS A f
AS -> ''
AS -> A AS
A -> a
So, I suspect lark is unwinding a*
into right-recursive rule and it results in conflicts.
I'll try to explicitly specify rules for as
(without *
) and see if that helps.
Thanks! Helping users is one of the best parts :)
Lark unwinds a*
into a left-recursive rule.
I have to say, I know the LALR algorithm in & out by now, and I still find "LR tables" confusing. I think it's a lot easier to understand as a stack of states, where a state is a set of rule-pointers (a rule-pointer is a dot that signifies position in a specific rule (like a counter). It's a little harder to visualize than a table, but I think it's easier to reason about once it clicks.
So, your grammar:
a: "a"
as: a*
start: as a "f"
actually expands into:
a: A
_as: _as a
| a
as: _as
|
start: as a "f"
(You could make _as
the empty rule, but I found that it's faster to take it out of the recursion)
So when you input A
, the grammar automatically shifts _as
, just because it can, although the correct action would have been to reduce the empty as
rule. And then it finds itself in an "invalid" state, in the sense that it can't parse the rest of your input.
The same idea happens if you input multiple A
s (but separate, which is why there are two conflicts). Whenever it sees A
it stays in the loop, and it needs to see something different to know it has to reduce.
Btw, changing it to "always reduce" is just going to create other inputs that fail to parse.
So without more knowledge about the problem you're trying to solve, my suggestion is to change your grammar, perhaps by abstracting away these collisions and handling them later in the tree (i.e. post-processing), or using Earley.
After long battle against LALR parsing tables I think I've lost. The language I'm trying to encode may not be LALR or maybe I'm just not understanding LALR enough. Currently I'm switching to GLR parser and it seems a lot easier (and slower probably, but still faster than earley).
@erezsh thank you very much for support and explaining things ;)
You can use Earley. It's as capable as GLR, though perhaps not as fast.
After long battle against LALR parsing tables
As I said, they aren't a good way to understand the algorithm.
I'm not sure if it's fault of grammar incompatible with LALR or is it a bug. Here it goes:
grammar.lark
(extremely boiled down dhall grammar)f (f f)
lark.Lark(open('grammar.lark'), parser='earley').parse('f (f f)')
)lark.Lark(open('grammar.lark'), parser='lalr').parse('f (f f)')
) - raiseslark.exceptions.UnexpectedToken
If this is because of this grammar being not parsable by LALR, I wonder if it would be possible to add error messages telling this. Currently I get something like "Unexpected token '(', expected one of ...". For me, ideally constructing parser from grammar of unsupported class should raise "Can't do it. Fix your grammar to be lalr.".