rust-itertools / itertools

Extra iterator adaptors, iterator methods, free functions, and macros.
https://docs.rs/itertools/
Apache License 2.0
2.77k stars 309 forks source link

Add a `MapWindows` Iterator adapter #670

Open Crazytieguy opened 1 year ago

Crazytieguy commented 1 year ago

For problems that require working with windows of expensive to clone items, or large windows, tuple_windows isn't a good solution. Implementing a borrowed windows iterator is tricky if not impossible, but if the user provides a mapping (or folding, etc...) function that returns an owned value it becomes trivial. I decided to focus on MapWindows for this first issue, but other interesting adapters could be MapArrayWindows (with const generic), MapWindowsMut, FoldWindows and combinations of these concepts.

Here's an example Implementation. Since this could be used for large windows or windows of large items, I assumed shifting the items is also likely to be expensive. So I only drop old items every size iterations as a compromise between memory usage and speed. I'm not sure this is the best decision and am open to discussion :)

pub struct MapWindows<I: Iterator, F, T>
where
    F: FnMut(&[I::Item]) -> T,
{
    iter: I,
    f: F,
    size: usize,
    buf: Vec<I::Item>,
}

impl<I: Iterator, F, T> MapWindows<I, F, T>
where
    F: FnMut(&[I::Item]) -> T,
{
    pub fn new(mut iter: I, size: usize, f: F) -> Self {
        let buf = iter.by_ref().take(size - 1).collect();
        Self { iter, f, size, buf }
    }
}

impl<I: Iterator, F, T> Iterator for MapWindows<I, F, T>
where
    F: FnMut(&[I::Item]) -> T,
{
    type Item = T;

    fn next(&mut self) -> Option<Self::Item> {
        self.iter.next().map(|next| {
            if self.buf.len() == self.size * 2 - 1 {
                self.buf.drain(..self.size);
            }
            self.buf.push(next);
            (self.f)(&self.buf[self.buf.len() - self.size..])
        })
    }
}
phimuemue commented 1 year ago

Hi there, I think I somewhat understand your idea.

If I understand correctly, this basically collects into a bounded, contiguous memory area one by one and exploits that a function will be applied to the slice. Please correct me if I'm wrong.

Wouldn't the canonical thing for this use-case be a VecDeque?

Crazytieguy commented 1 year ago

I think you understood correctly :)

A VecDeque is definitely more canonical, but the issue with it is that it's not contiguous. I had some ideas on how to utilize a VecDeque, but I didn't like any of them. What do you think?

phimuemue commented 1 year ago

I think the ideal solution would be to not require the function f at all. (But I think it's not that easy resp. impossible if we do not have lending iterators.)

As to your question: My gut feeling would be to go with option 2 (give closure a reference to VecDeque), but I do not want to commit myself without having thought about it (or even having collected use cases).

Crazytieguy commented 1 year ago

Makes sense. How do you usually collect use cases? This was my use case - an over optimized solution to Advent of code day 23 😅 https://github.com/Crazytieguy/advent-of-code/blob/master/2022/src/bin/day23/main.rs

Compared to using tuple windows this runs in about half the time

Crazytieguy commented 1 year ago

A couple more thoughts:

phimuemue commented 1 year ago

How do you usually collect use cases?

I have a list of things that I needed in my private projects, and when things are discussed, I may bring items from that list to the discussion. So essentially as you did with your use case.

Do you know of any progress towards lending iterators? Will they allow something like windows_mut?

I do not closely track the progress on this. I found https://rust-lang.github.io/generic-associated-types-initiative/design_patterns/iterable.html, and via GADTs you can implement lending iterators on your own (see https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=bedd591aa7940e141a721eee4f572b80). However, this introduces another kind of Iterator that loses interoparability with Itertools and other existing functions.

I am not sure if there are plans change the Iterator trait to support lending in one of the next editions.

scottmcm commented 1 year ago
  • Will they allow something like windows_mut?

Were you in a case where Cell would work? Because you can sometimes use https://doc.rust-lang.org/std/cell/struct.Cell.html#method.as_slice_of_cells along with windows to get something similar to windows_mut.

Crazytieguy commented 1 year ago

I think the playground example is cool - and I think maybe the best pattern would be to have a way to turn an Iterator2 back into an iterator, using a method like map the makes it no longer lending. So some new methods for iterators will return lending iterators and vice versa. The lending_iterator crate does basically that, but it has quite a clumsy api. I think I might try to write a similar crate myself 😁

Crazytieguy commented 1 year ago

Well this probably no longer concerns itertools, but if you want I'd appreciate feedback on this, as a proof of concept. I'm trying to contact the creator of lending_iterator to see if they'd want to modify their crate to something like this