Closed RobWalt closed 11 months ago
@scottmcm I believe you wrote Itertools::tree_fold1
, can you shed some light here?
Since
tree_fold1
won't really work as expected (unless you know what you're doing) in non-associative cases
I would say the different behaviour for non-associative cases is entirely the point of tree_fold1
(and of rfold
, for that matter).
If you have something truely associative, like u32::wrapping_add
, then there's no point in running the materially-more-complicated tree_fold1
since .reduce(u32::wrapping_add)
will do the same thing and probably do it much faster.
But even with something only slightly non-associative like f32::add
, that's where the good of the non-linear folding structure comes in. Simple demo: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=b6a1c419176ec7b02fb6af3539708da0
See also the string format bench (https://github.com/rust-itertools/itertools/blob/7a6c1ef3e771b75515a420b5759fc7b8c51248ff/benches/tree_fold1.rs#L57) for a place where the non-associativity of performance of the operation really matters.
Ok yeah, I see what you mean. Thanks for the explanation. I think I missed some things before. Let's take a closer look at all the possible cases:
associative:
f
is simple: e.g. cases like u32::wrapping_add
. I guess you wanted to say that this case is so simple that it's better to use the more simplistic reduce
since it's probably better for optimizations, code gen etc, right? Some evidence that this is true would be coolf
is complex, e.g. the format case. Here we recommend the use of tree_fold1
since it reduces the amount of required operations from n
to ln2(n)
. I thought this is the main use case of tree_fold1
and this is where my confusions and "improvements" of the docs come fromAs mentioned, this is imo the main use case and should be highlighted. Writing
If
f
is associative, prefer the normal [Iterator::reduce
] instead.
seems misleading.
i32::sub
case from the docs. tree_fold1
will most likely produce different results than reduce
in an "unintuitive" way and this is why it shouldn't be recommended in this casef32::add
. If you really know what you're doing and if you can estimate the data you're dealing with then it might make sense that you use it in this case. I think your example is a bit constructed in a way since the result of the f32::add
operations also depends on the order of the elements in the array and not only on the folding function used. So yeah, in a way tree_fold1
can produce a more exact result, but you need to ensure that the data is ordered so that it works out. Take this example where we shuffle the array first https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=31ea48fe708bf3778358c48a930963b5Both of the non-associative cases should be neglacted in my opinion.
One particular place where this can be really useful is in building binary trees. Compare the outputs in https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=f6a580a37e0a01b1c94767680dd444ca -- using tree_fold1
gives an O(log n)-depth tree vs the O(n)-depth tree from reduce
, and the lower depth tends to be better, since it means things like a recursive DFS will be much happier.
I don't know a great way to fit that example into a doc comment, though :/
Just did a very basic test on compiler explorer and both reduce
and tree_fold1
generate the exact same machine code for the simple case of u32::add
as can be seen here, which implies that there is no benefit of using one over the other.
To see this you need to scroll down all the way on the right hand side window. The generated assembly code is color coded the same way as the related rust code.
Never use +
over a Range
as your test, because when LLVM notices that's what you're doing it optimizes it to the closed-form solution.
I suggest comparing summing up a slice, where you'll see, for example, that the normal reduce
gets vectorized but the tree_fold
does not: https://rust.godbolt.org/z/6hb13ae4T
They're very very different things once you're not doing something that's not just a constant.
I guess my understanding of all of this is just lacking then. I'll just close the issue. Thanks a lot for the extra explanation, that was helpful! And sorry for the needless discussion
No, the discussion is good! It'd be great if you could find a nice way to reflect this difference into the documentation, to help the next person who's wondering why the method exists.
Thanks for the positive feedback :) Ok, then let me know which parts I should put the highlights on! The different use cases of tree_fold1
and reduce
based on the complexity of f
?
@scottmcm I adjusted the docs once again. Does this come closer to an acceptable explanation? 😅
bump to not forget this issue as it's also not that big of an issue 😅
Some of this doesn't seem right to me, and the docs now seem a bit confusing, if not inaccurate. Specifically:
if
f
is non-trivial likeformat!
, you should usetree_fold1
since it reduces the number of operations fromO(n)
toO(ln(n))
Both reduce
and tree_fold1
call f
n − 1 times, which is O(n).
I think what this is trying to get at is that if f
is O(n), the iterator elements are of similar sizes*, and size(f(x, y)
) = size(x
) + size(y
), then reduce(f)
is O(n2) and tree_fold1
is O(n log n).
format!
does satisfy these condition, but it may not be the best example. format!("{x}{y}")
and format!("{x}foo{y}")
are the only obvious non-trivial associative cases, and these are concat
and join
.
* Not necessary, but sufficient. reduce
gets faster if you put the largest items at the end.
I'm sorry if I made the docs less correct. I can look into this when I find the time again. Otherwise feel free to open a PR yourself
I was just reading through the docs when I noticed something odd. It's probably just a typo:
There is a recommendation to use
reduce
instead oftree_fold1
whenf
is associative. Sincetree_fold1
won't really work as expected (unless you know what you're doing) in non-associative cases, this would imply that we don't recommend the use oftree_fold1
at all?I just took a guess and adjusted that part of the docs to what it most likely was supposed to be. Feel free to close this PR if I'm completely wrong here 🙈