rust-itertools / itertools

Extra iterator adaptors, iterator methods, free functions, and macros.
https://docs.rs/itertools/
Apache License 2.0
2.72k stars 309 forks source link

`with_prev` #771

Open mightyiam opened 1 year ago

mightyiam commented 1 year ago

This seems generally useful enough to us.

Does anyone feel that we should attempt to contribute this to Iterator first?

Co-authored-by: Warren Wise warren.a.wise@gmail.com

Philippe-Cholet commented 1 year ago

I know it's different but it's quite similar to it.tuple_windows::<(_, _)>() which can provide more than one previous item.

mightyiam commented 1 year ago

I know it's different but it's quite similar to it.tuple_windows::<(_, _)>() which can provide more than one previous item.

Yes. We did consider that right before deciding to write this one.

Philippe-Cholet commented 1 year ago

I don't see a situation where I would prefer with_prev over tuple_windows. Do you have an example in mind? Following my previous comment, considering how tuple_windows is implemented, it might not be easy to extend with_prev to have multiple previous elements, mainly because the resulting Item is heterogeneous. But I'm thinking a bit about merging this with TupleWindows somehow.

scottmcm commented 1 year ago

+1 to skepticism about the homogeneity of this, and how frequently it'd be specifically useful vs just phrasing it via https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.scan or something.

I guess one option would be to have it be an iterator over

enum Window2<T> {
    First(T),
    Pair(T, T),
    Last(T),
}

or that could be const-generic extended to array::IntoIter<T, N>, I suppose, but that might be harder to use.

Would be good to see more motivation of what would want to use this.

Philippe-Cholet commented 1 year ago

I'm currently going through all issues, this is similar to #319.

mightyiam commented 1 year ago

Alrighty. Our use case is the Exercism Rust track exercise Acronym.

Here's our solution (see src/lib.rs of course).

We felt that tuple_windows doesn't quite fit as nicely as with_prev.

For this exercise, we only ever needed a single previous item. If you guys feel that with_prev is appropriate, we will look into whether it could be made generic on the number of previous items.

mightyiam commented 1 year ago

In the case of that exercise, the primary subject of iteration is the current char. The single (possible) previous char is for the purpose of examination in some following predicate.

Using tuple_windows would have been less appropriate, because in any item ((char, char)), it would not have been clear which is the current char and which is the possible previous char for examination.

For example, with tuple_windows, we'd get for a,b,c,d the items (a,b),(b,c),(c,d). These are three items while we are interested in four. With the theoretical with_prev, we'd get a clear (None,a),(Some(a),b),(Some(b),c),(Some(c),d).

Who calls the shots 'round 'ere?

jswrenn commented 1 year ago

I'm fairly convinced that this useful. I do think we need to generalize it to an arbitrary number of previous elements. I also think we could use const generics here — the item type being ([Option<T>; N], T), where N is the amount of look-back.

Philippe-Cholet commented 1 year ago

Who calls the shots 'round 'ere?

I would say, in that order: bluss, jswrenn, then phimuemue and (since a week) myself. bluss is not around anymore, jswrenn and phimuemue are around but busy (they handled quite some changes recently I think). I'm a bit more around but focused on multiple things.

To answer the rest a bit, I surely see the appeal of your proposition. But I think we should consider generalizing that to more than two items, which might be uneasy.

EDIT: I guess it mostly depends on the item type we want.

phimuemue commented 1 year ago

Hi there, just my two cents: I see that "filling up" the tuples up to a certain length and then keeping the length (as with_prev is currently discussed) may have value, but what about the other direction then?

I.e. I wonder whether we - if we include with_prev - should also offer with_next.

[1,2,3,4,5].with_prev::<[_; 3]>() // would give [1], [1,2], [1,2,3], [2,3,4], [3,4,5]
[1,2,3,4,5].with_next::<[_; 3]>() // would give [1,2,3], [2,3,4], [3,4,5], [4,5], [5]

In my book , both of them are equally useful.

As for [Option<T>; N]: I see this type offers itself, but - we have other use cases that might profit, too - we could think about using ArrayVec.

Philippe-Cholet commented 1 year ago

I guess with_next would be valuable too.

About types, I guess our options are (N meaning might change between types)

mightyiam commented 1 year ago

Sure. Totally. with_prev and with_next. And arbitrary contextual elements.

Which made us think of something like this:

// with 2 previous and 1 next
let it  = [0, 1, 2, 3, 4].with_around::<2, 1>();

itertools::assert_equal(
    it,
    vec![
        (None,    None,    0, Some(1)),
        (None,    Some(0), 1, Some(2)),
        (Some(0), Some(1), 2, Some(3)),
        (Some(1), Some(2), 3, Some(4)),
        (Some(2), Some(3), 4, None   ),
    ]
);

But there's a difference in bounds between previous and next. Previous would require T: Clone, we suppose, and next would require use of Peekable, perhaps? But Peekable is a struct, so perhaps there is no bound difference?


In parallel, we will contemplate the separate with_prev/with_next:

Fixed length of 1. No deal.

After a Some the following options would be Some. Imperfect representation.

Similar to the previous, except for an Option<T> for the item that is guaranteed to be Some(item). Let's bury this one.

Since we are talking about a compile-time N, this option misses the opportunity to reflect back that N at compile-time.

We guess this is the reasoning behind ArrayVec.

We do see an inconvenience here. There's no static position for the item. Perhaps the following option would be more appropriate:

We don't know the policy around dependencies. There seems to currently be a single dependency, either.

This option does enjoy the compile-time N reflecting back at the user to some degree. We're not sure how helpful this is.

(enum depending on N, surely <= 12) without const generic, similar to current tuple windows with a macro. I thought of it but it would not be my choice.


We suppose the most important criteria is pattern matching. And we haven't shown such examples, yet.

Any feedback at this point?

Philippe-Cholet commented 1 year ago

"either" is a dependency that does not evolve anymore, and is maintained by bluss (itertools owner). It won't make breaking changes. Basically, we want dependencies to not cause us trouble. And in practice, we can do a lot without them, don't you think? There is a pull request about lending iterators where there was discussion about adding a dependency where I learnt a few things if you want to know more.

The most important point here is that the generalization would be either using tuples with some macros (to handle multiple tuple lengths) OR more likely need const generics (like using 2,1 in .with_around::<2, 1>()) which requires MSRV >= 1.51 (it was increased to 1.41 recently). We have some itertools methods that return tuples (tuple_windows and tuple_combinations for example, which rely on some macro to handle tuples of lengths 1 to 12) while it could return fixed length arrays if we could use const generics. We will use "Const generics" at some point, no question about it, I think in the near future but we are not there yet. Once we do "const generics" (there is a label for it), we will have a lot to do I think (probably more than what's labelled right now).

Personally, I'd like to first focus on improving existing iterators. I've contributed to finish size hints (helpful to avoid reallocations). I'd like to focus on issue 755 at the moment since specializing fold would bring major time improvements on iterator methods that rely on it by default (mostly for for_each but there is also count, last, partition and reduce).

EDIT: with_around seems also interesting.