pest-parser / pest

The Elegant Parser
https://pest.rs
Apache License 2.0
4.67k stars 261 forks source link

Precedence climbing at the grammar level ? #386

Open Nadrieril opened 5 years ago

Nadrieril commented 5 years ago

TL;DR: could the code generator implement precedence climbing automatically as an optimization ?

Here is the context for my question: I'm working with a fairly complex language grammar generated from an ABNF that has a bunch of rules like:

plus_expression = { times_expression ~ ("+" ~ times_expression)* }
times_expression = { equal_expression ~ ("*" ~ equal_expression)* }

This works great, except for performance. From what I understand, precedence climbing is meant exactly for this kind of situation. However, I would need to change both the grammar and the consuming code to a fairly large extent.

This strikes me as rather weird: precedence feels like it should be defined in the grammar, and precedence climbing is rather orthogonal to the rest of what consuming a pest AST is about. Everything else about consuming a pest AST is super uniform, but this sticks out to me.

The pattern mentioned above seems rather simple: several rules of the form separated(other_rule, separator) referencing each other. Would it be possible for the code generator to detect this pattern and generate the precedence climbing code automatically ?

dragostis commented 5 years ago

This is exactly what the precedence climbing effort is all about! 😃

Instead of having separate expressions on multiple levels in your grammar, you should ideally only have one flat expression with all operators being on the same precedence level in the grammar:

expression = { term ~ (op ~ term)* }
op = _{
    plus
  | minus
  | times
  | ...
}

Then, the actual precedence will be dealt with later, when you feed the Paris into the PrecClimber structure.

Nadrieril commented 5 years ago

Ah, that's exactly what I'm trying to avoid ! I'd like to be able to match on Pairs as if I had written the various plus_expression = ... rules, just without the performance penalty.

dragostis commented 5 years ago

Why, though? You do get that information with PrecClimber. The only downside is that one cannot understand predecence/associativity by reading only the grammar, but functionality wise, it's the same.

Nadrieril commented 5 years ago

Because then I'd have to change both the grammar (that I do not control completely) and my code (that uses various macros that rely on the uniformity of Pairs matching). I feel like having to handle the precedence in my code misses the point of pest, which is that pest handles the parsing details and my code just consumes the pretty resulting AST.

dragostis commented 5 years ago

I do agree with you that ideally you'd get a pretty AST that you can consume, but I also don't really see this working very well when defined inside of the grammar. We've had this before pest 1.0 and it makes the syntax quite a big heavier and the implementation unnecessarily complex.

For 3.0 I'm thinking about a kind of a glue layer. Grammars will not contain precedence climbing information, but AST creation can simply be annotated with precedence/associativity information.

#[rule]
#[precedence = [
    left(plus, minus),
    left(times, divided),
    right(power)
]]
fn binary_expression(lhs: Expr, op: OP, rhs: Expr) -> Expr { ... }

I'm happy to hear more thoughts on the design and different opinions. Constructive feedback and debate is highly encouraged!

Nadrieril commented 5 years ago

Ah, I imagined that this would not need any new syntax at all, since the compiler can detect the plus_expression = ... pattern. That may not be as straightforward as I imagine it however. Note that this only handles precedence and not associativity, because associativity is very easy to handle in user code.

dragostis commented 5 years ago

IMHO, writing all those separate expression rules makes the grammar quite verbose, while adding little context. It would be fairly easy to optimize these rules with a custom optimizer, since they all follow the same pattern.

Nadrieril commented 5 years ago

Well, that was kind of my point: I hoped that pest would be that optimizer. But maybe pest is not particularly focused on cases like mine of automatically-generated grammars; I completely understand that you don't want to invest time in such a feature if it's not your main target use-case.

dragostis commented 5 years ago

No, no. I think this is a fair use case, but it's something to consider post 3.0.

Also, if this was hypothetically already optimized, how would you imagine the AST looking? Right now, you'd have nested nodes of increasing/decreasing precedence: PlusExpression > TimesExpression > EqualExpression ...

Nadrieril commented 5 years ago

Yeah, that's what I'd expect, since it's the straightforward translation. This also makes it fairly easy to consume, though repetitive. That's what I have right now with my custom macro system: https://github.com/Nadrieril/dhall-rust/blob/a4e8f799fb4665b210086c28647e0fa335384913/dhall_core/src/parser.rs#L668-L674: at each level I just fold the children with the appropriate binary operation. Granted that's a lot of unnecessary rules and is rather verbose; I'm starting to see why you would be a bit reluctant to the idea

segeljakt commented 5 years ago

I'm wondering, in your example, what is OP? Is it a Pair or something else? also, will the visitor style API be generated by Pest or be manual?

dragostis commented 5 years ago

@segeljakt, it's manual, but type-checked. This API will be called at parse time to actually generate the AST.

op is a Pair, yes.