lark-parser / lark

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
MIT License
4.86k stars 413 forks source link

Add syntax `__rule` that inlines during grammar preprocessing #822

Open erezsh opened 3 years ago

erezsh commented 3 years ago

Suggestion

Right now inlined rules (_rule) are inlined after parsing.

It is sometimes desirable to have them inlined during grammar preprocessing, to reduce that amount of REDUCE actions (which are computationally expensive), and possibly also avoid some shift/reduce errors.

That's especially handy for templates, which often aren't designed to be rules on their own.

Describe alternatives you've considered

Starting a rule or templates with two underscores (__rule) will inline it when preprocessing the grammar.

ThatXliner commented 2 years ago

+1

Rules whose name begins with an underscore will be inlined into their containing rule

My intuition says that it’s going to be done when lark constructs a parser from the grammar. But for “?rule”, I think that sort of inlining should be done after parsing input.

erezsh commented 2 years ago

We need to distinguish between two kinds of inlining

1) At the grammar level, before building the parser

2) At the parse tree level, while parsing

ThatXliner commented 2 years ago

In my opinion

_rules and the like should be grammar level

?rules should be during/after parsing

MegaIng commented 2 years ago

@ThatXliner

No, _rules should not be at grammar level, otherwise stuff like below has terrible performance

start: _options _options

_options: a | b | c | d | e | f | g | h

Currently that's a total of 1 + 8 = 9 rules. If _rules does inlining, that's 8 * 8 = 64 rules.

erezsh commented 2 years ago

Also, then recursion within _rule would be impossible.

ThatXliner commented 2 years ago

Oh right. My bad.

I was thinking about the _templates{}. I think those should be non-recursive for most use cases.

If that may break some people’s code, why not an inline keyword?

erezsh commented 2 years ago

If that may break some people’s code, why not an inline keyword?

That's what the __rule / __template syntax is for.

ThatXliner commented 2 years ago

I just think that might break something… 🤷‍♂️