rust-bakery / nom

Rust parser combinator framework
MIT License
9.36k stars 802 forks source link

Add an OutputMode type parameter to drive parser results #1631

Closed Geal closed 1 year ago

Geal commented 1 year ago

this work takes inspiration from chumsky's exploration of generic associated types. In chumsky's PR, output value creation depends on a Mode type parameter to either produce the value, or return (), and the call site decides if it needs the value or not. That mode can be transmitted from parent to child parser, and prevent the compiler from generating the code for an entire chain of results. As an example, in the delimited combinator, we test tree parsers in sequence, but we only care about the result of the second one. For the other two parsers, we only need to know that they succeeded, we do not need their result, so they might as well not be generated.

This PR extends this solution further, by having a mode parameter for errors too: in a lot of cases, nom combinators just need to know that a parser failed, but do not want to process the error. The OutputMode type also carries a parameter that will drive streaming VS complete behaviour.

TODO:

coveralls commented 1 year ago

Pull Request Test Coverage Report for Build 4726180626

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details


Changes Missing Coverage Covered Lines Changed/Added Lines %
src/character/complete.rs 10 12 83.33%
src/character/streaming.rs 10 12 83.33%
benchmarks/benches/http.rs 0 4 0.0%
src/branch/mod.rs 18 22 81.82%
src/sequence/mod.rs 13 18 72.22%
src/bytes/complete.rs 29 35 82.86%
src/number/complete.rs 18 24 75.0%
src/bytes/streaming.rs 29 36 80.56%
src/number/streaming.rs 16 24 66.67%
src/internal.rs 26 41 63.41%
<!-- Total: 738 1122 65.78% -->
Files with Coverage Reduction New Missed Lines %
src/number/complete.rs 1 65.22%
src/error.rs 2 16.21%
src/lib.rs 3 33.33%
src/multi/mod.rs 3 81.01%
benchmarks/benches/json.rs 21 0.0%
src/internal.rs 38 51.43%
src/traits.rs 104 60.71%
<!-- Total: 172 -->
Totals Coverage Status
Change from base Build 4430830784: 0.02%
Covered Lines: 1811
Relevant Lines: 2901

💛 - Coveralls
Geal commented 1 year ago

@epage this is still pretty early, so don't overindex on the current state of the code and API, I'm still exploring how it can work in the end

epage commented 1 year ago

@epage this is still pretty early, so don't overindex on the current state of the code and API, I'm still exploring how it can work in the end

I understand and am commenting now in case any of my thoughts are helpful for that exploratory process as I've been thinking about parallel problems. If its not helpful at this stage, I can back off.

Geal commented 1 year ago

it's able to run the JSON parser now (not fully, still missing the float parsing), and so far it looks slower (between 3% and 16%). I do not know yet where that comes from, I suspect that either there's an issue with recursin, or that some parts are not properly inlined. The UX looks ok, it requires adding a .parse() here and there but it would not be too traumatic to convert a nom 7 parser, and it's possible to convert selected parts of a parser as needed, while keeping the functions everywhere else.

Writing an implementation of Parser::process is a bit annoying though, I'll spend more time on UX

epage commented 1 year ago

I've been finding the json benchmark is too trivial to get reasonable numbers from it, so I've added a larger one: https://github.com/winnow-rs/winnow/pull/119/commits/1a4889380045e1da953eba7b5e71a2b365000ab7

FYI the dispatch! macro dropped json parse time from 12ms to 9ms for the large file: https://github.com/winnow-rs/winnow/pull/119

Geal commented 1 year ago

Yes, I've been testing with the canada file too, that's where I get a 3% difference. dispatch is a good idea, I wrote similar switches in other parsers but never took the time to add it back. FWIW the goal of that benchmark was not to make the fastest json parser, but to check the evolution between nom versions with a naive parser, so that's why I have not optimized it much

Geal commented 1 year ago

flamegraph of execution from main:

flamegraph-main-no-gat

flamegraph of execution from this PR:

flamegraph-gat

epage commented 1 year ago

dispatch is a good idea, I wrote similar switches in other parsers but never took the time to add it back.

Ah, I should have clarified. As dispatch! is probably the closest to the idealized result of GAT, the benchmark I ran provides an theoretical lower limit on parse times for the alt calls I replaced.

FWIW the goal of that benchmark was not to make the fastest json parser, but to check the evolution between nom versions with a naive parser, so that's why I have not optimized it much

Yes, I've kept a naive version which is what I'd highlight in winnow's cookbook. I also am considering writing an optimization guide that show cases the various stages of optimization and report the benchmark results so people can see the relative impact of various changes.

epage commented 1 year ago

Are your benchmarks with Error or VerboseError? While I'm surprised by a slow down, I would expect more of a speed up with an error type that does more processing.

tisonkun commented 1 year ago

It seems customized parser become harder after this patch.

Previously works:

fn comma_separated_list1<'a, T>(
    f: impl FnMut(Input<'a>) -> ParseResult<'a, T>,
) -> impl FnMut(Input<'a>) -> ParseResult<'a, Vec<T>> {
    separated_list1(Comma, f)
}

Now breaks:

error[E0277]: expected a `FnMut<(&'a [Token<'a>],)>` closure, found `impl Parser<&[Token<'_>], Output = Vec<<impl FnMut(Input<'a>) -> ParseResult<'a, T> as Parser<&[Token<'_>]>>::Output>, Error = parser::error::ParseError>`
   |
67 | ) -> impl FnMut(Input<'a>) -> ParseResult<'a, Vec<T>> {
   |      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected an `FnMut<(&'a [Token<'a>],)>` closure, found `impl Parser<&[Token<'_>], Output = Vec<<impl FnMut(Input<'a>) -> ParseResult<'a, T> as Parser<&[Token<'_>]>>::Output>, Error = parser::error::ParseError>`
68 |     separated_list1(Comma, f)
   |     ------------------------- return type was inferred to be `impl Parser<&[Token<'_>], Output = Vec<<impl FnMut(Input<'a>) -> ParseResult<'a, T> as Parser<&[Token<'_>]>>::Output>, Error = ParseError>` here
   |
   = help: the trait `FnMut<(&'a [Token<'a>],)>` is not implemented for `impl Parser<&[Token<'_>], Output = Vec<<impl FnMut(Input<'a>) -> ParseResult<'a, T> as Parser<&[Token<'_>]>>::Output>, Error = parser::error::ParseError>`
epage commented 1 year ago

If you switch the impl FnMut to impl Parser, I'm assuming it'll be resolved

tisonkun commented 1 year ago

@epage you're right. I actually change the return type when debugging, but not for the argument.

After changing the argument also, it seems work now:

fn comma_separated_list1<'a, T>(
    f: impl Parser<Input<'a>, Output = T, Error = ParseError>,
) -> impl Parser<Input<'a>, Output = Vec<T>, Error = ParseError> {
    separated_list1(Comma, f)
}

... while I may introduce a local trait bound to reduce the longy type name.

UPDATE - Here is the trait alias:

pub(self) trait ParserBound<'a, O> = Parser<Input<'a>, Output = O, Error = ParseError>;

... but a trait bound doesn't work:

pub(self) trait ParserBound<'a, O>:
    Parser<Input<'a>, Output = O, Error = ParseError>
{
}
error[E0277]: the trait bound `impl Parser<&[Token<'_>], Output = Vec<<impl ParserBound<'a, T> as Parser<&[Token<'_>]>>::Output>, Error = parser::error::ParseError>: ParserBound<'a, Vec<T>>` is not satisfied
  --> src/database/src/sql/parser/parse/mod.rs:69:64
   |
69 | fn comma_separated_list1<'a, T>(f: impl ParserBound<'a, T>) -> impl ParserBound<'a, Vec<T>> {
   |                                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `ParserBound<'a, Vec<T>>` is not implemented for `impl Parser<&[Token<'_>], Output = Vec<<impl ParserBound<'a, T> as Parser<&[Token<'_>]>>::Output>, Error = parser::error::ParseError>`
70 |     separated_list1(Comma, f)
   |     ------------------------- return type was inferred to be `impl Parser<&[Token<'_>], Output = Vec<<impl ParserBound<'a, T> as Parser<&[Token<'_>]>>::Output>, Error = ParseError>` here
epage commented 1 year ago

For trait ParserBound, did you add a blanket impl for Parser<Input<'a>, Output = O, Error = ParseError>?

tisonkun commented 1 year ago

blanket impl

I think I should do something like that. But actually I don't know what does it mean and how to do 🤣

epage commented 1 year ago
pub(self) trait ParserBound<'a, O>:
    Parser<Input<'a>, Output = O, Error = ParseError>
{
}

impl<'a, P, O> ParserBound<'a, O> for P
where
    P: Parser<Input<'a>, Output = O, Error = ParseError>
{
}
tisonkun commented 1 year ago

@epage Thank you! It works.