docs: fixup docs of `tree_fold1`

RobWalt commented 12 months ago

I was just reading through the docs when I noticed something odd. It's probably just a typo:

There is a recommendation to use reduce instead of tree_fold1 when f is associative. Since tree_fold1 won't really work as expected (unless you know what you're doing) in non-associative cases, this would imply that we don't recommend the use of tree_fold1 at all?

I just took a guess and adjusted that part of the docs to what it most likely was supposed to be. Feel free to close this PR if I'm completely wrong here 🙈

Philippe-Cholet commented 12 months ago

@scottmcm I believe you wrote Itertools::tree_fold1, can you shed some light here?

scottmcm commented 12 months ago

Since tree_fold1 won't really work as expected (unless you know what you're doing) in non-associative cases

I would say the different behaviour for non-associative cases is entirely the point of tree_fold1 (and of rfold, for that matter).

If you have something truely associative, like u32::wrapping_add, then there's no point in running the materially-more-complicated tree_fold1 since .reduce(u32::wrapping_add) will do the same thing and probably do it much faster.

But even with something only slightly non-associative like f32::add, that's where the good of the non-linear folding structure comes in. Simple demo: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=b6a1c419176ec7b02fb6af3539708da0

See also the string format bench (https://github.com/rust-itertools/itertools/blob/7a6c1ef3e771b75515a420b5759fc7b8c51248ff/benches/tree_fold1.rs#L57) for a place where the non-associativity of performance of the operation really matters.

RobWalt commented 12 months ago

Ok yeah, I see what you mean. Thanks for the explanation. I think I missed some things before. Let's take a closer look at all the possible cases:

associative:
- f is simple: e.g. cases like u32::wrapping_add. I guess you wanted to say that this case is so simple that it's better to use the more simplistic reduce since it's probably better for optimizations, code gen etc, right? Some evidence that this is true would be cool
- I can try to benchmark this a little to shine some light on it
- f is complex, e.g. the format case. Here we recommend the use of tree_fold1 since it reduces the amount of required operations from n to ln2(n). I thought this is the main use case of tree_fold1 and this is where my confusions and "improvements" of the docs come from
- As mentioned, this is imo the main use case and should be highlighted. Writing
  
  If f is associative, prefer the normal [Iterator::reduce] instead.
  
  seems misleading.
non-associative:
- in general: e.g. the i32::sub case from the docs. tree_fold1 will most likely produce different results than reduce in an "unintuitive" way and this is why it shouldn't be recommended in this case
- in special cases: e.g. your example with f32::add. If you really know what you're doing and if you can estimate the data you're dealing with then it might make sense that you use it in this case. I think your example is a bit constructed in a way since the result of the f32::add operations also depends on the order of the elements in the array and not only on the folding function used. So yeah, in a way tree_fold1 can produce a more exact result, but you need to ensure that the data is ordered so that it works out. Take this example where we shuffle the array first https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=31ea48fe708bf3778358c48a930963b5

Both of the non-associative cases should be neglacted in my opinion.

scottmcm commented 12 months ago

One particular place where this can be really useful is in building binary trees. Compare the outputs in https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=f6a580a37e0a01b1c94767680dd444ca -- using tree_fold1 gives an O(log n)-depth tree vs the O(n)-depth tree from reduce, and the lower depth tends to be better, since it means things like a recursive DFS will be much happier.

I don't know a great way to fit that example into a doc comment, though :/

RobWalt commented 12 months ago

Just did a very basic test on compiler explorer and both reduce and tree_fold1 generate the exact same machine code for the simple case of u32::add as can be seen here, which implies that there is no benefit of using one over the other.

To see this you need to scroll down all the way on the right hand side window. The generated assembly code is color coded the same way as the related rust code.

https://rust.godbolt.org/z/b7K5EenEE

scottmcm commented 12 months ago

Never use + over a Range as your test, because when LLVM notices that's what you're doing it optimizes it to the closed-form solution.

I suggest comparing summing up a slice, where you'll see, for example, that the normal reduce gets vectorized but the tree_fold does not: https://rust.godbolt.org/z/6hb13ae4T

They're very very different things once you're not doing something that's not just a constant.

RobWalt commented 12 months ago

I guess my understanding of all of this is just lacking then. I'll just close the issue. Thanks a lot for the extra explanation, that was helpful! And sorry for the needless discussion

scottmcm commented 12 months ago

No, the discussion is good! It'd be great if you could find a nice way to reflect this difference into the documentation, to help the next person who's wondering why the method exists.

RobWalt commented 12 months ago

Thanks for the positive feedback :) Ok, then let me know which parts I should put the highlights on! The different use cases of tree_fold1 and reduce based on the complexity of f?

RobWalt commented 11 months ago

@scottmcm I adjusted the docs once again. Does this come closer to an acceptable explanation? 😅

RobWalt commented 11 months ago

bump to not forget this issue as it's also not that big of an issue 😅

TomFryersMidsummer commented 9 months ago

Some of this doesn't seem right to me, and the docs now seem a bit confusing, if not inaccurate. Specifically:

if f is non-trivial like format!, you should use tree_fold1 since it reduces the number of operations from O(n) to O(ln(n))

Both reduce and tree_fold1 call f n − 1 times, which is O(n).

I think what this is trying to get at is that if f is O(n), the iterator elements are of similar sizes*, and size(f(x, y)) = size(x) + size(y), then reduce(f) is O(n²) and tree_fold1 is O(n log n).

format! does satisfy these condition, but it may not be the best example. format!("{x}{y}") and format!("{x}foo{y}") are the only obvious non-trivial associative cases, and these are concat and join.

* Not necessary, but sufficient. reduce gets faster if you put the largest items at the end.

RobWalt commented 9 months ago

I'm sorry if I made the docs less correct. I can look into this when I find the time again. Otherwise feel free to open a PR yourself

rust-itertools / itertools

docs: fixup docs of `tree_fold1` #787