rust-itertools / itertools

Extra iterator adaptors, iterator methods, free functions, and macros.
https://docs.rs/itertools/
Apache License 2.0
2.72k stars 309 forks source link

Use `fold` instead of `for_each`? #781

Closed Philippe-Cholet closed 1 year ago

Philippe-Cholet commented 1 year ago

https://github.com/rust-itertools/itertools/pull/780#discussion_r1355061606 by jswrenn

In the Rust standard library, it's quite common for fold to be implemented in terms of try_fold. In your original code you construct a TakeWhile and call fold on it. But TakeWhile's fold implementation delegates to try_fold!

The general rule to follow is something like this:

  • If you need to completely consume an iterator, use for_each or fold.

    • Use for_each if you don't need to maintain an accumulator.
    • Use fold if you do need to maintain an accumulator.
  • If you need to partly consume an iterator, use try_for_each or try_fold.

    • Same rules about accumulators apply.

Bold is mine as I wonder if we are using for_each (convenient) at multiple places instead of fold.

Current for_each uses:

(There is also Itertools::foreach but this one is okay.)

Why do I care? While benchmarking a new MapForGrouping::fold, I found out that some_slice_iter::for_each does not delegate to fold (default behavior) and is not optimized * (just while let Some(x) = self.next() { f(x); }) while some_slice_iter::fold is optimized (which is useful to see benchmark improvements of specializing fold methods of our adaptors by delegating to the inner iterator).

*: The for_each has a comment: "_We override the default implementation, which uses try_fold, because this simple implementation generates less LLVM IR and is faster to compile._"

Philippe-Cholet commented 1 year ago

Well, apart from Itertools::join that is known to be slow anyway, I benchmarked each "for_eachfold" change and got mostly severe regressions. The only "win" was -5% for into_group_map.

Before I close this, do you know why?

scottmcm commented 1 year ago
  • Use fold if you do need to maintain an accumulator.

I would tweak this one slightly:

Use fold if you need owned access to an accumulator

If the folder is something like |v, x| { v.push(x); v }, then it's (very slightly) better to do that as .for_each(|x| v.push(x)), using an FnMut closure to refer to local state.

Last people looked into it, threading the passing of a Vec or similar through all the folds was harder for LLVM to optimize well than if there's nothing (well, a trivial ()) to pass along. So if the owned access to the accumulator -- fold's superpower -- isn't needed, in μoptimized code it's better to use for_each capturing a &mut instead.

Philippe-Cholet commented 1 year ago

Thanks for the detailled precision.