rust-itertools / itertools

Extra iterator adaptors, iterator methods, free functions, and macros.
https://docs.rs/itertools/
Apache License 2.0
2.72k stars 309 forks source link

Feature request: a running fold adaptor #798

Open ronnodas opened 11 months ago

ronnodas commented 11 months ago

It's often useful to keep track of the intermediate results in a fold, as in Haskell's scanl, for example if you want to turn a iterator of numbers into an iterator of running totals. At first this looks like a job for scan, but it was easier to write using successors, as follows:

pub fn fold_map<I, B, F>(iter: I, init: B, mut f: F) -> impl Iterator<Item = B>
where
    I: IntoIterator,
    F: FnMut(&B, I::Item) -> B,
{
    let mut iter = iter.into_iter();
    std::iter::successors(Some(init), move |prev| iter.next().map(|x| f(prev, x)))
}

Then fold_map(1..4, 0, |x, y| x + y).collect_vec() produces [0, 1, 3, 6], see playground link.

Could a version of this be added to itertools?

Philippe-Cholet commented 11 months ago

Running totals and 0, |x,y| x+y surely makes me think of python's itertools.accumulate which was a bit discussed in #705 and #147. Which is there seen with scan: (0..4).scan(0, |x, y| { *x += y; Some(*x) }).collect_vec() to have [0,1,3,6] as well. But I like your version.

That being said, I don't know if we would add a version of this:

If we were to try to implement this, I would suggest to benchmark it to see if it's faster or slower than current possibilities.

ronnodas commented 11 months ago

Running totals and 0, |x,y| x+y surely makes me think of python's itertools.accumulate which was a bit discussed in #705 and #147. Which is there seen with scan: (0..4).scan(0, |x, y| { *x += y; Some(*x) }).collect_vec() to have [0,1,3,6] as well. But I like your version.

Note that the scan version is more the analog of reduce, so will not "yield" the initial value: (1..4).scan(0, |x, y| { *x += y; Some(*x) }).collect_vec() == [1, 3, 6]. This may or may not be what you want in a given application but it is easier to skip(1) than once(init).chain(...). Especially if its type is not Copy but Some(*x) does not work in that case to begin with.

The function I wrote will produce an iterator of length n + 1 if iter is of length n, which is part of why it is hard to write using scan. But the ownership issue from above is harder: scan does not allow keeping a reference to the "yielded" value (assuming the relevant types are not Copy) so the only way I found needed to run the loop "one step ahead" and use mem::replace. Even then you can't get the final value (ie the result of a usual fold) out without cloning it (or replacing with a dummy/default value) because Scan.state is private.

This is part of why I think it's a good candidate for addition to itertools: one "obvious" way you might want to write it seems tricky at best to get right.

An example where the types are not Copy: 2D prefix sums of a 2D grid

For me, the name fold_map is not intuitive enough yet, maybe another?

Absolutely not attached to the name. accumulate or fold_all or fold_running seem like other reasonable choices, but not attached to any of them either. scanl is too close to scan so not good.

Probably just a method in Itertools trait and not a free function.

I would expect to call it as iter.fold_map(...) instead of fold_map(iter, ...) if it was in itertools. Of course then iter could just be Iterator, not IntoIterator.

Philippe-Cholet commented 11 months ago

The name "accumulate" seems a reasonable name to me.

I agree that we should be able to "accumulate" without Copy, making your "successors"-way more general than the "scan"-way but definitely not obvious and therefore legitimate for itertools.

If it produces n + 1 elements then we should not implement ExactSizeIterator for the resulting iterator because it's just a bit longer. I went to look Python version: it produces n + initial.is_some() elements. ...We could do something like:

EDIT: Working on a draft.

EDIT 2: https://github.com/Philippe-Cholet/itertools/tree/accumulate-draft I'll surely come back to it at some point.