rust-itertools / itertools

Extra iterator adaptors, iterator methods, free functions, and macros.
https://docs.rs/itertools/
Apache License 2.0
2.62k stars 298 forks source link

Implement `FlattenOk::{fold, rfold}` #927

Closed kinto-b closed 2 months ago

kinto-b commented 2 months ago

Relates to #755

$ cargo bench --bench specializations flatten_ok/fold -- --baseline flatten_ok

flatten_ok/fold         time:   [3.0078 µs 3.2944 µs 3.5502 µs]
                        change: [-47.164% -42.833% -38.023%] (p = 0.00 < 0.05)
                        Performance has improved.

Edit: .rfold is an almost identical implementation, so I'll put it through in a moment. Just running benchmarks...

Edit:

$ cargo bench --bench specializations flatten_ok/rfold -- --baseline flatten_ok

flatten_ok/rfold        time:   [2.7442 µs 2.8254 µs 2.9134 µs]
                        change: [-42.924% -39.835% -36.485%] (p = 0.00 < 0.05)
                        Performance has improved.
codecov[bot] commented 2 months ago

Codecov Report

Attention: Patch coverage is 97.22222% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 94.55%. Comparing base (6814180) to head (568021d). Report is 64 commits behind head on master.

Files Patch % Lines
src/flatten_ok.rs 97.22% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #927 +/- ## ========================================== + Coverage 94.38% 94.55% +0.16% ========================================== Files 48 48 Lines 6665 6926 +261 ========================================== + Hits 6291 6549 +258 - Misses 374 377 +3 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

kinto-b commented 2 months ago

@Philippe-Cholet I was just looking back over the code wondering that myself! Investigating now...

Edit: Have to step out for dinner. Will look into this some more later on. I feel as though it shouldn't work. The necessary fix seems pretty simple, but I want to make the tests fail with the current code before I change anything

kinto-b commented 2 months ago

Ah so the tests fail to capture what's wrong with this code because they use Option<u8> values for each Ok item in the iterator. We need non-singleton values.

There is a comment there about it being too slow to use Vec<u8>:

https://github.com/rust-itertools/itertools/blob/dd6a56932141c8b18b9d479eba30134c5afbfb88/tests/specializations.rs#L455-L456

I've tested changing over to Vec<u8> and, though it catches the problem, it takes several minutes for the test to finish once the bug is ironed out.

What's the best approach here do you think?

Philippe-Cholet commented 2 months ago

Ah I think I wrote this comment. Use Vec<u8> here is really slow, it's not really an option. So if we can't give [u8; 3] instead of Option<u8> then maybe we can write our own limited vector (what a pain) and implement the necessary traits for this to work. I'm gonna investigate this.

Philippe-Cholet commented 2 months ago

So we basically need a light small vector, here is one with max 2 elements. The flatten_ok test pass under 1 second for me.

use quickcheck::Arbitrary;
use rand::Rng;
...
    // remove comment
    fn flatten_ok(v: Vec<Result<SmallIter2<u8>, char>>) -> () {
...

/// Like `VecIntoIter<T>` with maximum 2 elements.
#[derive(Debug, Clone, Default)]
enum SmallIter2<T> {
    #[default]
    Zero,
    One(T),
    Two(T, T),
}

impl<T: Arbitrary> Arbitrary for SmallIter2<T> {
    fn arbitrary<G: quickcheck::Gen>(g: &mut G) -> Self {
        match g.gen_range(0u8, 3) {
            0 => Self::Zero,
            1 => Self::One(T::arbitrary(g)),
            2 => Self::Two(T::arbitrary(g), T::arbitrary(g)),
            _ => unreachable!(),
        }
    }
    // maybe implement shrink too, maybe not
}

impl<T> Iterator for SmallIter2<T> {
    type Item = T;

    fn next(&mut self) -> Option<Self::Item> {
        match std::mem::take(self) {
            Self::Zero => None,
            Self::One(val) => Some(val),
            Self::Two(val, second) => {
                *self = Self::One(second);
                Some(val)
            }
        }
    }

    fn size_hint(&self) -> (usize, Option<usize>) {
        let len = match self {
            Self::Zero => 0,
            Self::One(_) => 1,
            Self::Two(_, _) => 2,
        };
        (len, Some(len))
    }
}

impl<T> DoubleEndedIterator for SmallIter2<T> {
    fn next_back(&mut self) -> Option<Self::Item> {
        match std::mem::take(self) {
            Self::Zero => None,
            Self::One(val) => Some(val),
            Self::Two(first, val) => {
                *self = Self::One(first);
                Some(val)
            }
        }
    }
}
scottmcm commented 2 months ago

nit: that looks wrong since it never shrinks from One to Zero. EDIT: No, I'm blind and missed the take, nevermind.

(ArrayVec<T, 2> is the classic solution for an iterator like this, but I would also understand not wanting to add a dev-dep just for this.)

Philippe-Cholet commented 2 months ago

I thought of ArrayVec<T, N> but I would have to wrap it anyway to implement quickcheck::Arbitrary (and therefore Iterators too). I would have added it if we wanted more control than a fixed max-length of 2.

kinto-b commented 2 months ago

@Philippe-Cholet Ah neat, thanks for that! Shall I pop that into its own module within tests/ or just stick it in tests/specializations.rs?

Philippe-Cholet commented 2 months ago

@kinto-b Just add this in "tests/specializations.rs" where it's needed. We can move it later if we need to.

Sidenote: For CI to pass again (because of recent Clippy 1.78), you will need to rebase your branch on master.

kinto-b commented 2 months ago

Thanks!