Closed tgdcat closed 4 months ago
It's a known issue with the Earley parser, that it isn't always deterministic in choosing a derivation when there's ambiguity.
A straight-forward solution would be do give explicit priority to the longer rules:
a: "a"
aa.1: "aa"
aaa.2: "aaa"
Another way is to force to lexer to expect a whitespace after the string:
a: /a(?=\s)/
aa: /aa(?=\s)/
aaa: /aaa/
(%ignore doesn't force a separation in Earley by design)
Another "fix" would be using lexer="basic"
, since it removes the ambiguity. You can also do parser="lalr"
.
Having said all this, I do think ideally your use case should work as-is without any of these fixes. So feel free to leave this open, and maybe we'll get to it some day.
Thank you for the quick response.
By prioritizing as you say, I was able to get the results I expected. I am grateful for your support.
Having said all this, I do think ideally your use case should work as-is without any of these fixes. So feel free to leave this open, and maybe we'll get to it some day.
Yes. Keep it open until you can resolve the ambiguity without adding anything.
Now returns a consistent output:
start
statement
a
statement
a
statement
a
statement
a
statement
a
statement
a
🎉
Thank you for creating Lark.
I want to use Lark to distinguish between "a" and "aa". (I'm sorry if it's an already mentioned issue.)
Run the following sample program.
The result I want is:
But every time I actually do it, the result is different.
Python 3.8.7 lark-parser 0.12.0
My approach may be wrong. I would appreciate it if you could tell me the solution.