rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
97.88k stars 12.67k forks source link

Inclusive version of take_while #62208

Open jonhoo opened 5 years ago

jonhoo commented 5 years ago

Iterator::take_while stops before the first element where the condition evaluates to false. That is the correct behavior given its name, but it makes it difficult to write an iterator that should include the boundary element. In particular consider something like iterating over a histogram, trimming off the tails:

for v in histogram
    .iter_linear(1_000)
    .skip_while(|v| v.quantile() < 0.01)
    .take_while(|v| v.quantile() < 0.99)

This may seem right, but consider an iterator where the elements are such that v.quantile() yields this sequence: [0, 0.02, 0.99]. Here, the bin that spans <0.02-0.99] will be dropped, even though it includes the majority of the samples. What we really want to do is take from the iterator until we have received an element whose v.quantile() > 0.99. To write that with take_while, we need something like:

let mut got_true = true;
for v in h
    .iter_linear(1_000)
    .skip_while(|v| v.quantile() < 0.01)
    .take_while(move |v| {
        if got_true {
            // we've already yielded when condition was true
            return false;
        }
        if v.quantile() > 0.99 {
            // this must be the first time condition returns true
            // we should yield i, and then no more
            got_true = true;
        }
        // we should keep yielding
        true
    })

Which isn't exactly obvious.

So, I propose we add Iterator::take_until_inclusive, which yields elements up to and including the first element for which its argument returns true. The name isn't great, but I'm also struggling to come up with a better one. In some sense, I want the function to communicate that it yields every item it evaluates. Some other possible names if we're willing to invert the boolean: break_once, break_after, end_at, end_once, end_after.

Thoughts?

Lonami commented 5 years ago

How about a take_until?

for v in h
    .iter_linear(1_000)
    .skip_while(|v| v.quantile() < 0.01)
    .take_until(|v| v.quantile() >= 0.99)
jonhoo commented 5 years ago

@Lonami take_until to me implies that it does not take the first element where the condition is true, which is why I went with take_until_inclusive instead. In some sense, at least to my ears, take_until(f) is the same as take_while(!f).

Lonami commented 5 years ago

Oh, I completely missed you mentioned take_until_inclusive already, sorry!

jonhoo commented 5 years ago

No worries at all, happens to us all! I'd be curious to hear your thoughts on whether the _inclusive bit is necessary though :)

yaahc commented 5 years ago

take_to maybe?

clarfonthey commented 5 years ago

How about a generic take_while_then? The signature would be:

fn take_while_then<F, T, I>(self, while: F, then: T) -> TakeWhileThen<Self, F, T, I>
while
    for<'a> F: FnOnce(&'a Self::Item) -> bool,
    T: FnOnce(Self) -> I,
    I: IntoIterator;

The resulting adapter would:

  1. Return the elements of self while while returns true.
  2. Then, return the elements of then(self).

The existing take_while would be equivalent to self.take_while(while, |_| empty()), and take_while_inclusive would be equivalent to self.take_while(while, |x| x.take(1)), although presumably more efficient.

Although I'm not sure how useful this kind of method would be, and it may be better to just have some form of take_while_inclusive by itself, as this is probably a common pattern.

hdevalke commented 5 years ago

I also needed this recently to decode some protocol where a stop bit is used to indicate the last byte of a field. The other 7 bits are used to encode the data. I could not use the iterator API because of this.

The implementation should be less or more the same as TakeWhile.

next could be implemented as:

    #[inline]
    fn next(&mut self) -> Option<I::Item> {
        if self.flag {
            None
        } else {
            let x = self.iter.next()?;
            if (self.predicate)(&x) {
                self.flag = true;
            }
            Some(x)
        }
    }

This would, however, take all elements until some condition is true, which is not the same as take_while_inclusive.

The BufRead::read_until exists and reads up to inclusive the delimiter. So take_until seems to be a good name.

hdevalke commented 5 years ago

I created a crate with this functionality: https://github.com/hdevalke/take-until

michaelsproul commented 4 years ago

Could be a good fit for the itertools crate (@hdevalke you could PR your code upstream)

hdevalke commented 4 years ago

@michaelsproul itertools already contains take_while_ref.

michaelsproul commented 4 years ago

I don't think take_while_ref or peeking_take_while are well suited to this because they require you to assign the iterator to a local variable, and muck around with mutation, which breaks the flow of the iterator chain. The best I could do with them was this, which isn't as nice as take_until.

If you don't mind, I'd be happy to try and get your code into itertools, giving you credit of course! (Co-authored-by on the commit)

hdevalke commented 4 years ago

@michaelsproul no problem if you want to do it, most of the code I wrote was inspired by the implementation of TakeWhile.

JohnScience commented 2 years ago

@hdevalke @michaelsproul Did it get into itertools? I didn't find it there at least by this name.

michaelsproul commented 2 years ago

I'm sorry, I never got around to adding it!

JohnScience commented 2 years ago

@michaelsproul That's regrettable. I opened an issue (https://github.com/rust-itertools/itertools/issues/597). I got around in my project using the take_until crate but I might come back only much much later to check the implementation and make a PR.

quixoticaxis commented 2 years ago

The implementation in the first post also seems to suffer from the need to take one more element from the underlying iterator when condition has been already met, which is probably fine for iterators working on data, but may become an issue for iterators that hide long operations. It is not obvious for me from the discussion, what is the current stance on adding take_until to std?

On a side note, is there any idiomatic way to construct impl Iterator<Item = Result<T, E>> that stops after the first error and fetches no more elements based on the infinite iterator impl Iterator<Item = Result<T, E>>? Apart from using a library or rolling your own implementation, of course.

UnlimitedCookies commented 1 year ago

Would it make sense to extend the PR to allow taking n additional elements after the condition is untrue? Perhaps proposing an even more general api design, where you can get a slice of an Iterator, which you could extend by some number? I also like the approach of take_while_then(…).

let mut got_true = true;
for v in h
    .iter_linear(1_000)
    .skip_while(|v| v.quantile() < 0.01)
    .take_while(move |v| {
        if got_true {
            // we've already yielded when condition was true
            return false;
        }
        if v.quantile() > 0.99 {
            // this must be the first time condition returns true
            // we should yield i, and then no more
            got_true = true;
        }
        // we should keep yielding
        true
    })

Also, shouldn't the example initialize got_true with false? And I guess you can also drop the move from the closure.