rust-itertools / itertools

Extra iterator adaptors, iterator methods, free functions, and macros.
https://docs.rs/itertools/
Apache License 2.0
2.71k stars 309 forks source link

Fix `Itertools::k_smallest` on short unfused iterators #900

Closed Philippe-Cholet closed 6 months ago

Philippe-Cholet commented 7 months ago

I'm shocked! 😲 I found a bug in my favorite method!

In the case of an unfused iterator yielding at most k-1 elements, then None, then other elements, the heap is not k-long and for_each is called on the other elements leading debug_assert_eq to rightfully panic.

Our recent variants of k_smallest do not have this bug (the .len() == check is present)!

Alternatively, iterators could be fused: let mut iter = iter.fuse();.

I probably should add a test about this.

Story: After recently reviewing #654 where I deeply looked at k_smallest and its variants and my really recent #899 where I similarly thought of checking the length of the collected elements before calling for_each, I eventually/luckily saw this bug.

codecov[bot] commented 7 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 94.56%. Comparing base (6814180) to head (0936c8d). Report is 31 commits behind head on master.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #900 +/- ## ========================================== + Coverage 94.38% 94.56% +0.17% ========================================== Files 48 48 Lines 6665 6753 +88 ========================================== + Hits 6291 6386 +95 + Misses 374 367 -7 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Philippe-Cholet commented 7 months ago

I'm exploring the possibility of buggy methods in the case of unfused iterators.

EDIT: Well, after a short investigation, I found that only the Err case of exactly_one is subject to weird behavior for an unfused iterator, PR to come.

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=b461adac3020a6f3e38087a56dd5cbd8 It contains an idea to test that our adaptors behave well (see #55).

phimuemue commented 6 months ago

I think we should fuse them: collect_vec+sort+truncate should be the same as k_smallest (see https://play.rust-lang.org/?version=stable&mode=release&edition=2021&gist=8e6e791d95adc335900ea6b8a2bfd95d).

Philippe-Cholet commented 6 months ago

Explicitely fuse the iterator is also clearer than checking .len() == k.

Done. And I did the same for the recent k_smallest_general.