Pratt parsing support - Githubissues

epage commented 1 year ago

Please complete the following tasks

[X] I have searched the discussions
[X] I have searched the open and rejected issues

winnow version

0.2.0

Describe your use case

This would be a performant way of handling precedence in parsing

See zesterer/chumsky#51

Describe the solution you'd like

See

zesterer/chumsky#144
rust-bakery/nom/pull/1362
https://gist.github.com/ilonachan/3d92577265846e5327a3011f7aa30770

Alternatives, if applicable

No response

Additional Context

No response

epage commented 11 months ago

For more technical details, see

Zij-IT commented 9 months ago

Heyyo! I was the one to handle a significant portion of the Pratt parser implementation that is available in chumsky :)

I would have to familiarize myself with how y'all do things here at Winnow, but I would be happy to take a crack at getting a Pratt parsing implementation going for Winnow.

Do you have an idea already for an API that you would prefer, or guidelines on how to keep the API winnow-style :D I'd love to contribute to getting some cross-pollination going between two great parsing crates!

epage commented 9 months ago

That would be greatly appreciated!

I have no formal background in parsing and only know of Pratt Parsing from matklad's posts and then your work with chumksy. I'm not even sure where or if I would use it in my own work. So its hard for me to give too specific of guidance on the API.

Overall, the philosophical divergence between chumsky and winnow is that chumsky is a framrwork while winnow is a toolbox. Chumsky owns the process and users plug into that (e.g. users are expected to not implement the Parser trait). Winnow provides common primitives and allows users to build up their parser out of that, as needed. We freely encourage mixing of imperative and functional / declarative parsers. Unsure how that might apply but thought I'd point that out.

One challenge I repeatedly run into is that when we give too high level of a helper, it also constrains the user. A trivial example of this is Parser::try_map which doesn't let you cut-to-top the error from try_map separate from the error from the parser it wraps (#180).

Zij-IT commented 9 months ago

No worries! I'll do what I can to explain what the plan is before jumping on the solution so that we can get at one that you feel both fits into winnow, matches it's approach and is performant!

Okay, so before getting into the implementation itself, would something like the following be acceptable? Below I have different variations of about the same concept listed. Which ones do you find more "winnow-ish"

Operator Parsers

For Pratt parsing there are three important pieces of information that are required in order to properly parse an operator:

Its associativity (right or left)
Its strength (how tightly it binds. e.g. + binds weaker than *)
Whether its prefix, postfix or infix

To this end, its important to discuss how an operator parser should look like. Here are a couple ideas that I have:

So, the first idea would be to take the implementation from Chumsky, and adapt the implementation for winnow. That would look like this:

// Pretend that `unary` and `binary` create expressions of one type `Expr`
prefix(Right(1), '-', |r| unary(r, Op::Neg));
infix(Left(0), '+', |l, r| binary(l, r, Op::Add));
postfix(Right(3), '!', |r| unary(r, Op::Fact));

Another idea would be to add infix, prefix and postfix methods onto Parser instead of free-standing functions:

'-'.prefix(Right(1), |r| unary(r, Op::Neg));
'+'.infix(Left(0), |l, r| binary(l, r, Op::Add));
'!'.prefix(Right(3), |r| unary(r, Op::Fact));

Pratt Parser

Assuming that you prefer the second option, here is an idea for how it would look like with a pratt method provided by the Parser trait. pratt could also be free-standing function which accepts two parsers: One for the "atom" expression and one for the operators.

// Allowing all the operators to be grouped in one tuple
let atom = digits1.map(Expr::Int);
let calc = atom.pratt((
    '-'.prefix(Right(1), |r| unary(r, Op::Neg));
    '+'.infix(Left(0), |l, r| binary(l, r, Op::Add));
    '!'.prefix(Right(3), |r| unary(r, Op::Fact));
));

General things

Would I be correct in assuming that this would be best put behind a feature "pratt", or would you want this to be included by default?

epage commented 9 months ago

Its strength (how tightly it binds. e.g. + binds weaker than *)

So this is "precedence", right?

Right(1),

I assume this is associativity and strength combined. Why have them together?

pratt could also be free-standing function which accepts two parsers:

This reminds me that I forgot to list one of the design principles. Grammar-level functionality (ie things you'd see in BNF like literals, repetition, alternatives) are free functions and Parser trait methods are reserved for adapting to the application domain (along with some grammar-level error reporting like Parser::try_map).

So I assume we'd go with

let calc = pratt(
    digits1.map(Expr::Int),
    (
        '-'.prefix(Right(1), |r| unary(r, Op::Neg));
        '+'.infix(Left(0), |l, r| binary(l, r, Op::Add));
        '!'.prefix(Right(3), |r| unary(r, Op::Fact));
    )
);

Looking at chumsky, prefix/postfix don't use an associativity, only strength. Thoughts on that?
I find their functions wrapping the Associativity enum odd. I guess just to force everything to be lower case?
Looks like chumsky accepts multiple function signatures for fold expressions. Is the assumption that this would do the same?

Naming is hard. I'm finding that I prefer names that works at the grammar-level like with #440. I've also had bad experiences looking at graph libraries and not being able to find what I want because they are named after their algorithm and not their purpose. Of course, there is the challenge of multiple algorithms that can meet the same purpose with different criteria, so I'm not too sure how to balance the two.

Some tools we have to help are

#[doc(alias = "pratt")] (if we don't name it pratt)
#[doc(alias = "separated")]
Cross-linking separated to pratt
Ensuring the doc-comment summary focuses on the intent

As an aside, when it comes to the combinator summary, feel free to leave all the entries blank because this won't fit into such a simple summary.

Would I be correct in assuming that this would be best put behind a feature "pratt", or would you want this to be included by default?

You could start it under unstable-pratt which gives us time to answer questions like that and lowers the bar for what the initial API looks like.

For when we get there, is there a reason to have this be a separate feature?

Zij-IT commented 9 months ago

So this is "precedence", right?

Yup!

I assume this is associativity and strength combined. Why have them together?

That's just how it was done with Chumsky. It would be totally fine to have another parameter for the strength.

Looking at chumsky, prefix/postfix don't use an associativity, only strength. Thoughts on that?

I just totally spaced that, and should have removed it from prefix and postfix.

I find their functions wrapping the Associativity enum odd. I guess just to force everything to be lower case?

This choice is largely just keystrokes I believe. It is indeed a bit odd in hindsight :sweat_smile:

Looks like chumsky accepts multiple function signatures for fold expressions. Is the assumption that this would do the same?

It can be if that would be desired. It would just require an intermediate trait as in the Chumsky impl. It would also be possible to just restrict it to something in the form Fn(O, Op) -> O for pre- or postfix and Fn(O, Op, O) -> O for infix. I don't know if winnow has an equivalent to MapExtra that would be meaningful to pass in.

For when we get there, is there a reason to have this be a separate feature?

Not any that I can think of. It should be a rather small addition, but I wanted to check :) Chumsky has it under a flag, so I wanted to check.

epage commented 9 months ago

It can be if that would be desired. It would just require an intermediate trait as in the Chumsky impl. It would also be possible to just restrict it to something in the form Fn(O, Op) -> O for pre- or postfix and Fn(O, Op, O) -> O for infix. I don't know if winnow has an equivalent to MapExtra that would be meaningful to pass in.

We don't have MapExtra at this time.

I'm fine with a trait being used for this. I'm considering doing similar in other places.

This has been a great conversation however if we start with it under unstable-pratt, we can move forward with whatever and iterate on it over time without affecting semver.

epage commented 3 months ago

https://gist.github.com/ilonachan/3d92577265846e5327a3011f7aa30770 is interesting to consider which I came across via rust-bakery/nom#1362

winnow-rs / winnow

Pratt parsing support #131

Please complete the following tasks

winnow version

Describe your use case

Describe the solution you'd like

Alternatives, if applicable

Additional Context

Operator Parsers

Pratt Parser

General things