Closed shawnanastasio closed 5 years ago
Earley currently does not support weights on terminals. But it should work as you expect it to with weights on rules. Such as:
root: a | b
a.1: ("0".."9")+
b.2: ("0".."9")+
This is probably not the best behavior. Hopefully I'll get the chance to fix it soon.
I see. Perhaps you could help me work around this issue with a project I'm working on.
In the project I have terminals similar to the following for unquoted strings, decimal numbers, and hexadecimal numbers:
root: STR | DEC | HEX
STR: /\S/+
DEC: ("0".."9")+
HEX: ("0x" ("0".."9" | "A".."F" | "a".."f")+)
I'd like to be able to have things like "10" always parse as a decimal and "0xFF" as a hexadecimal, but the non-deterministic behavior is causing them to sometimes be interpreted as strings.
Is there an easy way I can refactor this to work around the unexpected behavior?
Of course. Just add intermediate rules.
root: STR | DEC | hex
hex.2: HEX
etc.
Hope this helps!
Thanks for the advice. It seems to work in some cases but not in others.
For example, with the grammar and text below, the hexadecimal literal seems to be interpreted as a string unless I remove the string terminal entirely, in which case it is interpreted as a hexadecimal.
from lark import Lark
grammar = """
root: "BEGIN" statement+ "END"
statement: assignment
assignment: IDENTIFIER "=" rvalue
?rvalue: decimal
| hexadecimal
| STRING
IDENTIFIER: ("A".."z" | "0".."9")+
// basic types
STRING: /\S/+
DECIMAL: ("0".."9")+
HEXADECIMAL: ("0x" ("0".."9" | "A".."F" | "a".."f")+)
// wrappers for ambiguous types above
decimal.2: DECIMAL
hexadecimal.2: HEXADECIMAL
%import common.WS
%ignore WS
"""
text = """
BEGIN
MY_VAR = 0xDEADBEEF
END
"""
parser = Lark(grammar, start="root", parser="earley")
print(parser.parse(text))
I'm not sure if this is an issue with my grammar or with the parser. The lalr
parser is able to produce the correct behavior every time, but it is unfortunately unsuitable for the project I'm currently working on.
Any assistance would be greatly appreciated.
Hmm yeah, this is an actual bug with the current implementation.
I'm currently working on a new earley implementation that should solve this problem. You can check out the branch earley_sppf
and try it for yourself. It's still under work, but it's fairly stable. Let me know how it works out for you!
I was able to switch over to lalr
for the time being. Thank you very much for your help.
If you feel that this issue falls under the scope of #191, please feel free to close it.
0.7 published
Perhaps related to #191
When using the earley parser with weighted terminals, the result is still non-deterministic in certain cases.
Example:
It is my understanding that using weights was a workaround for this non-determinism (as indicated in #167), but this does not seem to be the case.
I am using the latest lark version available on pip (0.6.2) on Fedora Linux 27 x86_64.