Closed Philippe-Cholet closed 1 year ago
Well, apart from Itertools::join
that is known to be slow anyway, I benchmarked each "for_each
→ fold
" change and got mostly severe regressions. The only "win" was -5% for into_group_map
.
Before I close this, do you know why?
- Use
fold
if you do need to maintain an accumulator.
I would tweak this one slightly:
Use
fold
if you need owned access to an accumulator
If the folder is something like |v, x| { v.push(x); v }
, then it's (very slightly) better to do that as .for_each(|x| v.push(x))
, using an FnMut
closure to refer to local state.
Last people looked into it, threading the passing of a Vec
or similar through all the folds was harder for LLVM to optimize well than if there's nothing (well, a trivial ()
) to pass along. So if the owned access to the accumulator -- fold
's superpower -- isn't needed, in μoptimized code it's better to use for_each
capturing a &mut
instead.
Thanks for the detailled precision.
https://github.com/rust-itertools/itertools/pull/780#discussion_r1355061606 by jswrenn
Bold is mine as I wonder if we are using
for_each
(convenient) at multiple places instead offold
.Current
for_each
uses:Itertools::{counts, partition_map, join}
min_set_impl
GroupingMap::{aggregate, collect}
into_group_map
k_smallest
(There is also
Itertools::foreach
but this one is okay.)Why do I care? While benchmarking a new
MapForGrouping::fold
, I found out thatsome_slice_iter::for_each
does not delegate tofold
(default behavior) and is not optimized * (justwhile let Some(x) = self.next() { f(x); }
) whilesome_slice_iter::fold
is optimized (which is useful to see benchmark improvements of specializingfold
methods of our adaptors by delegating to the inner iterator).*: The
for_each
has a comment: "_We override the default implementation, which usestry_fold
, because this simple implementation generates less LLVM IR and is faster to compile._"