winnow-rs / winnow

Making parsing a breeze
https://docs.rs/winnow
Other
526 stars 40 forks source link

Large tuples are a performance footgun #237

Open epage opened 1 year ago

epage commented 1 year ago

Please complete the following tasks

rust version

1.68

winnow version

0.4.0

Minimal reproducible code

(a, b).map(...).parse_next(input)

Steps to reproduce the bug with the above code

TBD

Actual Behaviour

Slow

Expected Behaviour

Fast

Additional Context

See #230

Options

epage commented 1 year ago

One thing that further decreases performance are large output types. Surprisingly, I've seen quite big performance improvements by Box-ing large types that are passed though multiple layers of parser code. Of course, boxing by default is a performance footgun and usually makes the parser substantially slower.

In addition to that, tuple parsers should be used carefully as they can run into the same performance issues when multiple large output types are involved or if they simply have too many items in them.

E.g. i replaced something like

(a, b).map(...).parse_next(input)

with

let (input, a) = a.parse_next(input)?;
...
let (input, b) = b.parse_next(input)?;
...
Ok((input, ...))

in quite a few places to resolve performance issues.

I wonder if #251 improved the situation as the compiler is now more likely to optimize the tuple version into the imperative version.

epage commented 1 year ago

@martinohmann when you have more time, could you create a branch where hcl-edit is using tuples so I can do some more analysis of this? I'd like to see how #251 or tuple alternatives may be able to help improve things. Because of the chance of this being fixed in #251, I'm deprioritizing this for now, so no rush.

martinohmann commented 1 year ago

Good idea! I'll try find some tuple cases. The parser is built in a way now that makes bringing these back a bit more involved. But I think I see 1-3 cases that are "easy" to negatively impact performance. Not sure if I can get to it this month or next month. Will ping you once I have a branch.

epage commented 22 hours ago

@praveenperera in https://github.com/winnow-rs/winnow/issues/191#issuecomment-2390488478

Specifically cover the cost of large return types which can show up in surprising ways like just using a tuple (#230)

@epage This is a question I had, when you say watch for large return types. Is it better to call parse_next multiple times and use regular if else branching?

Instead of trying to purely do it all with combinators?

Examples and more context: https://www.perplexity.ai/search/rust-nom-and-winnow-parsing-is-SaJuUbDxSfu09wjpXcOctA

Though in my example, I’m not actually returning, the tuple.

epage commented 22 hours ago

Examples and more context: https://www.perplexity.ai/search/rust-nom-and-winnow-parsing-is-SaJuUbDxSfu09wjpXcOctA

That AI answer has a semblance of sounding correct but is completely wrong.

The answers for when to make individual parse_next() calls is reasonable.

epage commented 22 hours ago

This is a question I had, when you say watch for large return types. Is it better to call parse_next multiple times and use regular if else branching?

Instead of trying to purely do it all with combinators?

Though in my example, I’m not actually returning, the tuple.

If you use (...).parse_next(), a tuple is being returned, even if its not by your function. Ultimately the performance hit is from large return types but that is difficult to create one without (...).parse_next(), so the topics are closely related.

If you have a function like

fn add(left: usize, right: usize) -> usize {
    left + right
}

the compiler will effectively transform this to

fn add(left: usize, right: usize, out result: usize) {
    out = left + right
}

The fastest form of memory is called a register and parameters and return types go through these where possible. However, if a parameter or return type becomes too big, the compiler will instead return through the stack which is using memory (with dcaches), turning it into

fn add(left: &Large, right: &Large, result: &mut Large) {
    *out = left + right
}

Before v0.5, winnow's return type was

Result<
    (I, O),
    ErrMode<InputError<I>>,
>

Meaning that Winnow <=v0.4 added size_of::<I>() overhead to the return type, making it more likely we'd "spill over into the stack".

Now that we use &mut I as a parameter, our return type is

Result<
    O,
    ErrMode<ContextError>,
>

Winnow has a fixed overhead for error reporting (reducable by specifying a custom error type) but the I overhead is gone. I am considering making ErrMode optional, allowing the overhead to be dropped even further.

However, if you do (parser1, parser2, parser3, parser4, parser5, parsser6).parse_next(), you could still spill over into the stack. I've been hoping we could give rustc enough hints to be able to rewrite that into the imperative form but so far it has not worked in enough cases.

In general, write your code for readability and if its too slow, experiment. It can be surprising what things can speed up or slow things down as there are ripple effects in optmizations. I tried to reduce shuffle things owe I had fewer large return types and instead I made performance worse.

praveenperera commented 22 hours ago

@epage very detailed and very helpful thank you! You should add their comments to the docs.

One more question :

Before v0.5, winnow's return type was

Result<(I, O), ErrMode<InputError>

That is still the return type for parse_peek so i’m assuming if i’m using that function I should still be careful of tuples?

Thanks again

update

For anyone reading this, here is an example of switching from parse_peek to parse_next: https://github.com/bitcoinppl/cove/commit/67c98fb75b7368d30d5cf6b1518f58fdf90c7d08

epage commented 22 hours ago

As mentioned on the other thread, parse_peek should really only be used for testing at this point.

praveenperera commented 22 hours ago

Got it thanks, I’ll change it over. Would you accept a PR if I went through all the examples in the docs (for the individual functions) and changed them to parse_next?

epage commented 22 hours ago

Besides the migration cost, I left them that way out of concern for legibility. The current form makes it easy to demonstrate the APIs behavior and I'm hesitant how it would be to use parse_next. This would best be split out into its own issue and maybe do a test run on some functions to get feedback on how it looks.