Add an OutputMode type parameter to drive parser results

Geal commented 1 year ago

this work takes inspiration from chumsky's exploration of generic associated types. In chumsky's PR, output value creation depends on a Mode type parameter to either produce the value, or return (), and the call site decides if it needs the value or not. That mode can be transmitted from parent to child parser, and prevent the compiler from generating the code for an entire chain of results. As an example, in the delimited combinator, we test tree parsers in sequence, but we only care about the result of the second one. For the other two parsers, we only need to know that they succeeded, we do not need their result, so they might as well not be generated.

This PR extends this solution further, by having a mode parameter for errors too: in a lot of cases, nom combinators just need to know that a parser failed, but do not want to process the error. The OutputMode type also carries a parameter that will drive streaming VS complete behaviour.

TODO:

[ ] validate conversion of multi combinators
[ ] validate conversion of branch combinators
[ ] validate conversion of number combinators
[ ] rewrite some tests and examples to avoid the .parse syntactic noise
[ ] test the effect on benchmarks
[ ] convert the rest of the parsers
[ ] docs

coveralls commented 1 year ago

Pull Request Test Coverage Report for Build 4726180626

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

738 of 1122 (65.78%) changed or added relevant lines in 19 files are covered.
172 unchanged lines in 7 files lost coverage.
Overall coverage increased (+0.02%) to 62.427%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
src/character/complete.rs	10	12	83.33%
src/character/streaming.rs	10	12	83.33%
benchmarks/benches/http.rs	0	4	0.0%
src/branch/mod.rs	18	22	81.82%
src/sequence/mod.rs	13	18	72.22%
src/bytes/complete.rs	29	35	82.86%
src/number/complete.rs	18	24	75.0%
src/bytes/streaming.rs	29	36	80.56%
src/number/streaming.rs	16	24	66.67%
src/internal.rs	26	41	63.41%
<!--	Total:	738	1122	65.78%	-->

Files with Coverage Reduction	New Missed Lines	%
src/number/complete.rs	1	65.22%
src/error.rs	2	16.21%
src/lib.rs	3	33.33%
src/multi/mod.rs	3	81.01%
benchmarks/benches/json.rs	21	0.0%
src/internal.rs	38	51.43%
src/traits.rs	104	60.71%
<!--	Total:	172	-->

Totals
Change from base Build 4430830784:	0.02%
Covered Lines:	1811
Relevant Lines:	2901

💛 - Coveralls

Geal commented 1 year ago

@epage this is still pretty early, so don't overindex on the current state of the code and API, I'm still exploring how it can work in the end

epage commented 1 year ago

@epage this is still pretty early, so don't overindex on the current state of the code and API, I'm still exploring how it can work in the end

I understand and am commenting now in case any of my thoughts are helpful for that exploratory process as I've been thinking about parallel problems. If its not helpful at this stage, I can back off.

Geal commented 1 year ago

it's able to run the JSON parser now (not fully, still missing the float parsing), and so far it looks slower (between 3% and 16%). I do not know yet where that comes from, I suspect that either there's an issue with recursin, or that some parts are not properly inlined. The UX looks ok, it requires adding a .parse() here and there but it would not be too traumatic to convert a nom 7 parser, and it's possible to convert selected parts of a parser as needed, while keeping the functions everywhere else.

Writing an implementation of Parser::process is a bit annoying though, I'll spend more time on UX

epage commented 1 year ago

I've been finding the json benchmark is too trivial to get reasonable numbers from it, so I've added a larger one: https://github.com/winnow-rs/winnow/pull/119/commits/1a4889380045e1da953eba7b5e71a2b365000ab7

FYI the dispatch! macro dropped json parse time from 12ms to 9ms for the large file: https://github.com/winnow-rs/winnow/pull/119

Geal commented 1 year ago

Yes, I've been testing with the canada file too, that's where I get a 3% difference. dispatch is a good idea, I wrote similar switches in other parsers but never took the time to add it back. FWIW the goal of that benchmark was not to make the fastest json parser, but to check the evolution between nom versions with a naive parser, so that's why I have not optimized it much

Geal commented 1 year ago

flamegraph of execution from main:

flamegraph-main-no-gat

flamegraph of execution from this PR:

flamegraph-gat

epage commented 1 year ago

dispatch is a good idea, I wrote similar switches in other parsers but never took the time to add it back.

Ah, I should have clarified. As dispatch! is probably the closest to the idealized result of GAT, the benchmark I ran provides an theoretical lower limit on parse times for the alt calls I replaced.

FWIW the goal of that benchmark was not to make the fastest json parser, but to check the evolution between nom versions with a naive parser, so that's why I have not optimized it much

Yes, I've kept a naive version which is what I'd highlight in winnow's cookbook. I also am considering writing an optimization guide that show cases the various stages of optimization and report the benchmark results so people can see the relative impact of various changes.

epage commented 1 year ago

Are your benchmarks with Error or VerboseError? While I'm surprised by a slow down, I would expect more of a speed up with an error type that does more processing.

tisonkun commented 1 year ago

It seems customized parser become harder after this patch.

Previously works:

fn comma_separated_list1<'a, T>(
    f: impl FnMut(Input<'a>) -> ParseResult<'a, T>,
) -> impl FnMut(Input<'a>) -> ParseResult<'a, Vec<T>> {
    separated_list1(Comma, f)
}

Now breaks:

error[E0277]: expected a `FnMut<(&'a [Token<'a>],)>` closure, found `impl Parser<&[Token<'_>], Output = Vec<<impl FnMut(Input<'a>) -> ParseResult<'a, T> as Parser<&[Token<'_>]>>::Output>, Error = parser::error::ParseError>`
   |
67 | ) -> impl FnMut(Input<'a>) -> ParseResult<'a, Vec<T>> {
   |      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected an `FnMut<(&'a [Token<'a>],)>` closure, found `impl Parser<&[Token<'_>], Output = Vec<<impl FnMut(Input<'a>) -> ParseResult<'a, T> as Parser<&[Token<'_>]>>::Output>, Error = parser::error::ParseError>`
68 |     separated_list1(Comma, f)
   |     ------------------------- return type was inferred to be `impl Parser<&[Token<'_>], Output = Vec<<impl FnMut(Input<'a>) -> ParseResult<'a, T> as Parser<&[Token<'_>]>>::Output>, Error = ParseError>` here
   |
   = help: the trait `FnMut<(&'a [Token<'a>],)>` is not implemented for `impl Parser<&[Token<'_>], Output = Vec<<impl FnMut(Input<'a>) -> ParseResult<'a, T> as Parser<&[Token<'_>]>>::Output>, Error = parser::error::ParseError>`

epage commented 1 year ago

If you switch the impl FnMut to impl Parser, I'm assuming it'll be resolved

tisonkun commented 1 year ago

@epage you're right. I actually change the return type when debugging, but not for the argument.

After changing the argument also, it seems work now:

fn comma_separated_list1<'a, T>(
    f: impl Parser<Input<'a>, Output = T, Error = ParseError>,
) -> impl Parser<Input<'a>, Output = Vec<T>, Error = ParseError> {
    separated_list1(Comma, f)
}

... while I may introduce a local trait bound to reduce the longy type name.

UPDATE - Here is the trait alias:

pub(self) trait ParserBound<'a, O> = Parser<Input<'a>, Output = O, Error = ParseError>;

... but a trait bound doesn't work:

pub(self) trait ParserBound<'a, O>:
    Parser<Input<'a>, Output = O, Error = ParseError>
{
}

error[E0277]: the trait bound `impl Parser<&[Token<'_>], Output = Vec<<impl ParserBound<'a, T> as Parser<&[Token<'_>]>>::Output>, Error = parser::error::ParseError>: ParserBound<'a, Vec<T>>` is not satisfied
  --> src/database/src/sql/parser/parse/mod.rs:69:64
   |
69 | fn comma_separated_list1<'a, T>(f: impl ParserBound<'a, T>) -> impl ParserBound<'a, Vec<T>> {
   |                                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `ParserBound<'a, Vec<T>>` is not implemented for `impl Parser<&[Token<'_>], Output = Vec<<impl ParserBound<'a, T> as Parser<&[Token<'_>]>>::Output>, Error = parser::error::ParseError>`
70 |     separated_list1(Comma, f)
   |     ------------------------- return type was inferred to be `impl Parser<&[Token<'_>], Output = Vec<<impl ParserBound<'a, T> as Parser<&[Token<'_>]>>::Output>, Error = ParseError>` here

epage commented 1 year ago

For trait ParserBound, did you add a blanket impl for Parser<Input<'a>, Output = O, Error = ParseError>?

tisonkun commented 1 year ago

blanket impl

I think I should do something like that. But actually I don't know what does it mean and how to do 🤣

epage commented 1 year ago

pub(self) trait ParserBound<'a, O>:
    Parser<Input<'a>, Output = O, Error = ParseError>
{
}

impl<'a, P, O> ParserBound<'a, O> for P
where
    P: Parser<Input<'a>, Output = O, Error = ParseError>
{
}

tisonkun commented 1 year ago

@epage Thank you! It works.

rust-bakery / nom