Open erezsh opened 6 years ago
I don't thing this is (always) true. There are two different reasons for this: !rule
and %ignore
:
start: "a" "b"
%ignore " "
will not match the same things as
start: "ab"
%ignore " "
The same goes for !rule
:
!start: "a" "b"
will not return the same tree as:
!start: "ab"
Well, for !rule
, the optimization could be performed, and then reversed in post-processing by parsing the joined string. But that's probably not going to help performance.
As for %ignore
, that's pretty simple:
start: "a" "b"
%ignore " "
Becomes
start: _JOINED
_JOINED: "a" " "* "b"
%ignore " "
On the same note, regarding %ignore and terminals, should this behave like this:
start: TEST
TEST: "a" "b"
%ignore " "
will not match "a b". I know why, but this is a huge difference between terminals and rules. To correct this issue, you would have to already the post-processing splitter inplace. Probably not worth it.
It's possible to improve Lark's speed, for all algorithms, by performing grammar simplification. Specifically, it's possible to merge adjacent anonymous terminals together, to reduce the parser work load, without affecting the resulting tree (An example of this technique can be seen in https://github.com/lark-parser/lark/issues/218)
In the same vein, ignored terminals can be joined to anonymous ones, with a similar effect.