Large tuples are a performance footgun

Please complete the following tasks

[X] I have searched the discussions
[X] I have searched the open and rejected issues

rust version

1.68

winnow version

0.4.0

Minimal reproducible code

(a, b).map(...).parse_next(input)

Steps to reproduce the bug with the above code

TBD

Actual Behaviour

Slow

Expected Behaviour

Fast

Additional Context

See #230

Options

Reduce the number of tuple elements supported to not encourage bad behavior
The struct macro from #82

One thing that further decreases performance are large output types. Surprisingly, I've seen quite big performance improvements by Box-ing large types that are passed though multiple layers of parser code. Of course, boxing by default is a performance footgun and usually makes the parser substantially slower.

In addition to that, tuple parsers should be used carefully as they can run into the same performance issues when multiple large output types are involved or if they simply have too many items in them.

E.g. i replaced something like
(a, b).map(...).parse_next(input)
with
let (input, a) = a.parse_next(input)?;
...
let (input, b) = b.parse_next(input)?;
...
Ok((input, ...))
in quite a few places to resolve performance issues.

I wonder if #251 improved the situation as the compiler is now more likely to optimize the tuple version into the imperative version.

@martinohmann when you have more time, could you create a branch where hcl-edit is using tuples so I can do some more analysis of this? I'd like to see how #251 or tuple alternatives may be able to help improve things. Because of the chance of this being fixed in #251, I'm deprioritizing this for now, so no rush.

Good idea! I'll try find some tuple cases. The parser is built in a way now that makes bringing these back a bit more involved. But I think I see 1-3 cases that are "easy" to negatively impact performance. Not sure if I can get to it this month or next month. Will ping you once I have a branch.

@praveenperera in https://github.com/winnow-rs/winnow/issues/191#issuecomment-2390488478

Specifically cover the cost of large return types which can show up in surprising ways like just using a tuple (#230)

@epage This is a question I had, when you say watch for large return types. Is it better to call parse_next multiple times and use regular if else branching?

Instead of trying to purely do it all with combinators?

Examples and more context: https://www.perplexity.ai/search/rust-nom-and-winnow-parsing-is-SaJuUbDxSfu09wjpXcOctA

Though in my example, I’m not actually returning, the tuple.

Examples and more context: https://www.perplexity.ai/search/rust-nom-and-winnow-parsing-is-SaJuUbDxSfu09wjpXcOctA

That AI answer has a semblance of sounding correct but is completely wrong.

(...).parse_next() will just call parse_next() on everything inside of the tuple, there is no difference. Also, in cases like calling parse_next on a function, the overhead would only be the creation of a stack frame but we #[inline(always)] that call, so there should be no overhead
Sometimes a large combination of parsers is easier to read, sometimes imperative calling of parsers is easier to read. Its dependent on context. In particular, if you force the use of combinators for cases they aren't really designed for, the complexity will be high, making the readability low
One large set of combinators does not make error handling easier. In fact, combinators limit the expressiveness of error reporting (e.g. #180)
Combinator or imperative should not affect optimizations the library can apply to your parser.
- For GAT based libraries, like nom v8, what can make a difference is having your own parser functions that you compose into bigger parsers because that stops nom v8 from propagating the GAT through which is used for optimizations. This surprising performance cliff is one reason we've chosen not to leverage GATs.

The answers for when to make individual parse_next() calls is reasonable.

This is a question I had, when you say watch for large return types. Is it better to call parse_next multiple times and use regular if else branching?

Instead of trying to purely do it all with combinators?

Though in my example, I’m not actually returning, the tuple.

If you use (...).parse_next(), a tuple is being returned, even if its not by your function. Ultimately the performance hit is from large return types but that is difficult to create one without (...).parse_next(), so the topics are closely related.

If you have a function like

fn add(left: usize, right: usize) -> usize {
    left + right
}

the compiler will effectively transform this to

fn add(left: usize, right: usize, out result: usize) {
    out = left + right
}

The fastest form of memory is called a register and parameters and return types go through these where possible. However, if a parameter or return type becomes too big, the compiler will instead return through the stack which is using memory (with dcaches), turning it into

fn add(left: &Large, right: &Large, result: &mut Large) {
    *out = left + right
}

Before v0.5, winnow's return type was

Result<
    (I, O),
    ErrMode<InputError<I>>,
>

Meaning that Winnow <=v0.4 added size_of::<I>() overhead to the return type, making it more likely we'd "spill over into the stack".

Now that we use &mut I as a parameter, our return type is

Result<
    O,
    ErrMode<ContextError>,
>

Winnow has a fixed overhead for error reporting (reducable by specifying a custom error type) but the I overhead is gone. I am considering making ErrMode optional, allowing the overhead to be dropped even further.

However, if you do (parser1, parser2, parser3, parser4, parser5, parsser6).parse_next(), you could still spill over into the stack. I've been hoping we could give rustc enough hints to be able to rewrite that into the imperative form but so far it has not worked in enough cases.

In general, write your code for readability and if its too slow, experiment. It can be surprising what things can speed up or slow things down as there are ripple effects in optmizations. I tried to reduce shuffle things owe I had fewer large return types and instead I made performance worse.

@epage very detailed and very helpful thank you! You should add their comments to the docs.

One more question :

Before v0.5, winnow's return type was

Result<(I, O), ErrMode<InputError>

That is still the return type for parse_peek so i’m assuming if i’m using that function I should still be careful of tuples?

Thanks again

update

For anyone reading this, here is an example of switching from parse_peek to parse_next: https://github.com/bitcoinppl/cove/commit/67c98fb75b7368d30d5cf6b1518f58fdf90c7d08

As mentioned on the other thread, parse_peek should really only be used for testing at this point.

Got it thanks, I’ll change it over. Would you accept a PR if I went through all the examples in the docs (for the individual functions) and changed them to parse_next?

Besides the migration cost, I left them that way out of concern for legibility. The current form makes it easy to demonstrate the APIs behavior and I'm hesitant how it would be to use parse_next. This would best be split out into its own issue and maybe do a test run on some functions to get feedback on how it looks.

winnow-rs / winnow