kevinmehall / rust-peg

Parsing Expression Grammar (PEG) parser generator for Rust
https://crates.io/crates/peg
MIT License
1.46k stars 106 forks source link

Span capture with precedence climbing #316

Closed linkdd closed 2 years ago

linkdd commented 2 years ago

When inside a precedence!{} block, if you define a rule like this:

l:position!() lhs:(@) "+" rhs:@ r:position!() { /* ... */ }

Instead of:

lhs:(@) "+" rhs:@ { /* ... */ }

It complains with the following error: `@` is only allowed in `precedence!{}`

Span capture is useful when your grammar generates an AST, when each node should have source mapping information, that can be used when validating the semantics of the AST after parsing.

kevinmehall commented 2 years ago

Would be relatively easy to add support for that pattern. What do you think about the need to duplicate that on every operator, though?

There is already support for this weird special-cased syntax for providing a Rust expression that wraps all other nodes produced by precedence climbing with position info: https://github.com/kevinmehall/rust-peg/blob/master/tests/run-pass/arithmetic_infix_ast_span.rs. I didn't document it because I wasn't satisfied with it. Would that be better for you if stabilized/documented with a less confusing syntax, suggestions welcome?

linkdd commented 2 years ago

Thank you for the example, it seems to get rid of the error, but now I'm confronted to the following error:

expected &dyn Fn(usize, &mut ParseState, &mut ErrorState, &dyn Fn(usize, i32, &mut ParseState, &mut ErrorState)),
found
&|usize, &mut ParseState, {unknown}, {unknown}| -> RuleResult<Node<TypeRef>>

Where Node<TypeRef> is the return type of my grammar rule.

Here is the code for the rules:

rule type_refs() -> Vec<Node<ast::types::TypeRef>>
  = type_ref() ++ [Token::Comma]

rule type_ref() -> Node<ast::types::TypeRef>
  = precedence!{
    l:position!() data:@ r:position!() {
      Node::new((l, r), data)
    }
    --
    lhs:(@) [Token::OperatorBinOr] rhs:@ {
      ast::types::TypeRef::one_of(vec![lhs, rhs])
    }
    --
    lhs:(@) [Token::OperatorBinAnd] rhs:@ {
      ast::types::TypeRef::all_of(vec![lhs, rhs])
    }
    --
    [Token::Negation] t:(@) {
      ast::types::TypeRef::not(t)
    }
    --
    t:type_ref_term() { t }
    [Token::ParenthesisBegin] t:type_ref() [Token::ParenthesisEnd] { t.data }
  }

rule type_ref_term() -> Box<ast::types::TypeRef>
  = type_ref_term_generic_symbol()
  / type_ref_term_concrete_symbol()
  / type_ref_term_struct_def()
  / type_ref_term_tuple_def()
  / type_ref_term_function_signature()
  / type_ref_term_literal()

// ...

If I remove the following line:

[Token::ParenthesisBegin] t:type_ref() [Token::ParenthesisEnd] { t.data }

It works again but has no support of parenthesis.

NB: For context, here is the Node<T> type:

pub type LocationInfo = (usize, usize);

#[derive(Clone, Debug, PartialEq)]
pub struct Node<T> {
  pub location: LocationInfo,
  pub data: Box<T>,
}
linkdd commented 2 years ago

As for the syntax, it is unclear at the moment how operators with 3 operands would work:

Or operators with a complex syntax:

It would be best, from a user point-of-view, to not be restricted on what kind of rule you can write (the example with span capture would be clearer if specified at each step, IMHO)

kevinmehall commented 2 years ago

I think the type inference error is that t.data is Box<TypeRef> but I assume Node::new() is expecting just TypeRef, so you may need to use * to move out of the Box. A bit unfortunate that the error message is talking about the internal closure type generated there.

linkdd commented 2 years ago

Node::new() expected Box<T> :

impl<T> Node<T> {
  pub fn new(location: LocationInfo, data: Box<T>) -> Self {
    Self { location, data }
  }
}
kevinmehall commented 2 years ago

I don't recall exactly what the precedence rules are for ?: in C, but https://stackoverflow.com/questions/13681293/how-can-i-incorporate-ternary-operators-into-a-precedence-climbing-algorithm suggests treating it as a binary operator, which would be:

a:(@) "?" b:expr() ":" c:@

that is, the left and right sides participate in precedence climbing (left associative here, not sure if that's right), but the middle expr is just part of a "?" expr() ":" binary operator that recurses into the rule from the top.

List indexing and function calls could be considered a postfix operator:

lhs:@ "[" idx:expr() "]"

The recursive call inside the brackets / parentheses doesn't participate in precedence climbing, so you can just recursively call the rule rather than using @.

Your coro rule is just an atom, I think -- neither expr participates in precedence climbing.

linkdd commented 2 years ago

The way to determine if an expression should participate in precedence climbing feels arbitrary but, ok I see how complex cases can be reduced to a simpler use case.

Yet, it is unclear how those rules should be written with the precende!{} macro, since it requires to start with a @ or ends with a @ (unless I'm mistaken in my understanding?).

Still, the type inference error I linked above is still not fixed, so I ended up doing "manual" precedence climbing using #[cache_left_rec] on my intermediate rules. It works well :)

kevinmehall commented 2 years ago

The way to determine if an expression should participate in precedence climbing feels arbitrary but, ok I see how complex cases can be reduced to a simpler use case.

Precedence climbing is responsible for building the tree using operator precedence and associativity rules. When you have the subexpression fully enclosed in () or [] or :?, it can't associate with adjacent expressions, and the fact that it's recursive doesn't matter. There's only one way to parse it. The parser doesn't care that it contains a recursive call back into the rule -- a postfix "[" expr() "]" is the same as a postfix "++" as far as the parser is concerned.

Yet, it is unclear how those rules should be written with the precende!{} macro, since it requires to start with a @ or ends with a @ (unless I'm mistaken in my understanding?).

Should be able to write them exactly as I had above: lhs:@ "[" idx:expr() "]" { some_node(lhs, idx) }. A precedence rule with a single @ at the beginning is a postfix operator.

Still, the type inference error I linked above is still not fixed, so I ended up doing "manual" precedence climbing using #[cache_left_rec]

That works too!

If you do want to troubleshoot the type error, my suggestion would be to add explicit type annotations, e.g. {let x: Box<TypeRef> = t.data; x} to see if you can get the type error to point to your code instead of the macro-generated code.

I've also been meaning to see if I can simplify the precedence expansion to give type inference an easier time, and it's clear that it could use some better examples of a more complete language than just math expressions too.

linkdd commented 2 years ago

it can't associate with adjacent expressions, and the fact that it's recursive doesn't matter. There's only one way to parse it.

Ah! That explains it well, thank you.

If you do want to troubleshoot the type error, my suggestion would be to add explicit type annotations

This did not help :/

I've also been meaning to see if I can simplify the precedence expansion to give type

To be fair, using #[cache_left_rec] was actually easier than the precedence!{} macro. But that may be prior experience with other grammar generators talking.

Given enough documentation and examples, I'd argue that we don't need that macro (less is more). Using #[cache_left_rec] is more lenient on how you write the rules, and might be the way to go anyway for any non-trivial case (proof required).

linkdd commented 2 years ago

It turns out the closure error is coming from rust-analyzer, rustc does not complain. I'm closing the issue.