Closed davidmcnabnz closed 1 year ago
Following up, I've found a workaround. Since the files I'm parsing all have the colons immediately after the section titles, no spaces, I am now using a separate terminal which specifically includes the trailing colon.
?start: section+
section: section_header section_item+
section_header: SECTION_NAME -> section_header
section_item: NAME "=" NUMBER
NAME: /[a-z][a-z_0-9]*/
SECTION_NAME: /[a-z][a-z_0-9]*:/
%import common.WS_INLINE
%import common.NUMBER
%ignore WS_INLINE
Then, in the transformer's section_header()
handler method, I just strip off the trailing colon.
This feels like a bit of a hack, but at least it's got me around the immediate issue.
The grammar is not ambiguous as a CFG, but it is "ambiguous" in LALR(1) because there is only 1 token lookahead. Namely, at the end of each section
, if it sees NAME
it has to choose between starting a new section, or starting a new section_item.
Your solution is adequate, and is probably the easiest way to get around this problem.
@erezsh Thanks for your review and reply.
Naively, I was unaware that LALR(1) could only look ahead 1 token. Is the '1 token' lookahead an inherent design limitation of LALR(1)? Or is it possible to get it looking further ahead.
I did try Earley, but this doesn't support the use of a Transformer during the parse.
Using a transformer during the parse is mostly used for performance. If performance isn't a big issue, you might not need it.
Yes, it's an inherent limitation of LALR(1). Unfortunately Lark doesn't implement LALR(2) or above.
However, you could probably emulate it by using the interactive parser, since you can lookahead "manually" at each step of the parse, and influence the decisions it takes.
But, that's probably a very big overkill, if your problem is as simple as you described it.
I'll take a good look at the 'interactive parser', since its power may be needed when I add in the more complex elements of the (inherited) language I'm needing to parse.
Thanks again. I'm grateful that LARK is a live project, with responsive support that keeps it at the top of the Python general parsing ecosystem.
Thanks for the compliments!
For a real-life example of the interactive parser, have a look at this: https://github.com/geographika/mappyfile/blob/master/mappyfile/parser.py#L205
Describe the bug
Shift/Reduce Conflict is being incorrectly detected. The grammar is definitely not ambiguous.
To Reproduce
Create an LALR parser on the grammar below, with debug turned on. Feed it with the grammar:
The LALR parser analyser is claiming that the terminal
NAME
presents a shift/reduce conflict for the rule:The grammar is intending to parse text input such as:
The grammar provides only two legal places where a
NAME
terminal may appear:Is this a bug in Lark? Or is there a subtle ambiguity I'm missing in this grammar?
Either way, can anyone please advise a way to resolve or work around this?