Closed reidpr closed 5 years ago
In LALR, when you %ignore a terminal, it means it gets dropped, and never reaches the parser.
Earley knows to try it both ways, which is why it works.
I think what you wanted to do, is the following:
?start: foo+
foo: "FOO"i / [A-Za-z0-9._-]+/ _NEWLINE
_NEWLINE: "\n" // underscored terminals are automatically removed from the tree
OK, thank you. Is that the same reason why the following grammar:
?start: foo+
foo: "FOO"i SPACE /[A-Za-z0-9._-]+/ _NEWLINE
SPACE: " "
%ignore SPACE
_NEWLINE: "\n"
gives:
lark.exceptions.UnexpectedCharacters: No terminal defined for 'b' at line 1 col 5
FOO bar
^
Expecting: {'SPACE'}
This also works if I underscore SPACE
instead of %ignore
ing it.
Yes, same reason.
It's probably better to ignore whitespace, and then just not write it in the grammar.
But if you have to control for whitespace, don't ignore it.
I think maybe I did not understand %ignore
correctly. So if I say (as in the JSON tutorial):
%import common.WS
%ignore WS
This means that any WS
terminals that appear anywhere are just ignored, and need not be specified in the grammar (and thus WS
terminals are accepted anywhere). On the other hand, the underscore prefix says to match the terminal but remove it from the tree after the tree is constructed.
Is that correct?
Does %ignore
affect positions recorded by propagate_positions=True
?
Yes, that is correct.
Positions should be always correct, regardless of %ignore or otherwise.
Thanks so much. That's all extremely helpful and clarifies perfectly.
Good!
I have the following MWE:
Actual behavior:
Expected behavior: No exception; parse tree does not have
NEWLINE
tokens in it.Am I doing something wrong? Is this a bug in Lark? What other information can I provide?
This does not happen with the Earley parser. However, I am using the LALR(1) parser because Earley is giving me some nondeterministic behavior in my actual application.
Thank you for your hard work on Lark. It is very pleasant to work with.