Closed wstevick closed 1 year ago
/The LALR parser just isn't that good at disambiguating. Once it knows that STATEMENT_SEP
could follow a block, it will always check for it, even if in this specific context it won't happen. (it doesn't know that, because it's nested context).
The solution is to restructure your grammar so that for every rule that STATEMENT_SEP
follows, it always follows it.
Here is an example solution for your toy example, I can't how well it will fit into the full language:
from lark import Lark
grammar = r"""
start: function_def*
function_def: WORD block
block: "{" statement* "}"
word_block: "{" statement* "}"
?statement: (WORD WORD | WORD word_block) STATEMENT_SEP
WORD: /\w+/
STATEMENT_SEP.1: /\r?\n/+
%import common.WS
%ignore WS
"""
text = r"""
thing {
a b
c
d
e {}
}
otherthing
{}
"""
parser = Lark(grammar, parser="lalr")
print(parser.parse(text))
Or just use Earley, which should be able to handle it without any additional effort on your end.
Thanks, that fixed it for me.
I'm working on toy language, and I'm using lark to write a parser for it. I want to be able to use newlines to separate statements, but also to split up long statements. Here's my minimal code.
This is with the lalr parser and contextual lexer. My thought is that because I've given
STATEMENT_SEP
a higher precedence thanWS
, the lexer will try to match new lines to it first. But because it's the contextual lexer, if a newline is the middle of a statement (in this case, something like"a \n b
), it'll match toWS
instead. Here's my test code:When I try to parse it though, I get this error message.
I'm assuming this is a problem with my code. What am I doing wrong?