Open Nadrieril opened 5 years ago
This is exactly what the precedence climbing effort is all about! 😃
Instead of having separate expressions on multiple levels in your grammar, you should ideally only have one flat expression with all operators being on the same precedence level in the grammar:
expression = { term ~ (op ~ term)* }
op = _{
plus
| minus
| times
| ...
}
Then, the actual precedence will be dealt with later, when you feed the Paris
into the PrecClimber
structure.
Ah, that's exactly what I'm trying to avoid ! I'd like to be able to match on Pairs
as if I had written the various plus_expression = ...
rules, just without the performance penalty.
Why, though? You do get that information with PrecClimber
. The only downside is that one cannot understand predecence/associativity by reading only the grammar, but functionality wise, it's the same.
Because then I'd have to change both the grammar (that I do not control completely) and my code (that uses various macros that rely on the uniformity of Pairs
matching).
I feel like having to handle the precedence in my code misses the point of pest, which is that pest handles the parsing details and my code just consumes the pretty resulting AST.
I do agree with you that ideally you'd get a pretty AST that you can consume, but I also don't really see this working very well when defined inside of the grammar. We've had this before pest 1.0 and it makes the syntax quite a big heavier and the implementation unnecessarily complex.
For 3.0 I'm thinking about a kind of a glue layer. Grammars will not contain precedence climbing information, but AST creation can simply be annotated with precedence/associativity information.
#[rule]
#[precedence = [
left(plus, minus),
left(times, divided),
right(power)
]]
fn binary_expression(lhs: Expr, op: OP, rhs: Expr) -> Expr { ... }
I'm happy to hear more thoughts on the design and different opinions. Constructive feedback and debate is highly encouraged!
Ah, I imagined that this would not need any new syntax at all, since the compiler can detect the plus_expression = ...
pattern. That may not be as straightforward as I imagine it however.
Note that this only handles precedence and not associativity, because associativity is very easy to handle in user code.
IMHO, writing all those separate expression rules makes the grammar quite verbose, while adding little context. It would be fairly easy to optimize these rules with a custom optimizer, since they all follow the same pattern.
Well, that was kind of my point: I hoped that pest would be that optimizer. But maybe pest is not particularly focused on cases like mine of automatically-generated grammars; I completely understand that you don't want to invest time in such a feature if it's not your main target use-case.
No, no. I think this is a fair use case, but it's something to consider post 3.0.
Also, if this was hypothetically already optimized, how would you imagine the AST looking? Right now, you'd have nested nodes of increasing/decreasing precedence: PlusExpression
> TimesExpression
> EqualExpression
...
Yeah, that's what I'd expect, since it's the straightforward translation. This also makes it fairly easy to consume, though repetitive. That's what I have right now with my custom macro system: https://github.com/Nadrieril/dhall-rust/blob/a4e8f799fb4665b210086c28647e0fa335384913/dhall_core/src/parser.rs#L668-L674: at each level I just fold the children with the appropriate binary operation. Granted that's a lot of unnecessary rules and is rather verbose; I'm starting to see why you would be a bit reluctant to the idea
I'm wondering, in your example, what is OP
? Is it a Pair
or something else? also, will the visitor style API be generated by Pest or be manual?
@segeljakt, it's manual, but type-checked. This API will be called at parse time to actually generate the AST.
op
is a Pair
, yes.
TL;DR: could the code generator implement precedence climbing automatically as an optimization ?
Here is the context for my question: I'm working with a fairly complex language grammar generated from an ABNF that has a bunch of rules like:
This works great, except for performance. From what I understand, precedence climbing is meant exactly for this kind of situation. However, I would need to change both the grammar and the consuming code to a fairly large extent.
This strikes me as rather weird: precedence feels like it should be defined in the grammar, and precedence climbing is rather orthogonal to the rest of what consuming a pest AST is about. Everything else about consuming a pest AST is super uniform, but this sticks out to me.
The pattern mentioned above seems rather simple: several rules of the form
separated(other_rule, separator)
referencing each other. Would it be possible for the code generator to detect this pattern and generate the precedence climbing code automatically ?