rust-itertools / itertools

Extra iterator adaptors, iterator methods, free functions, and macros.
https://docs.rs/itertools/
Apache License 2.0
2.72k stars 309 forks source link

More `size_hint` and `count` methods #725

Closed Philippe-Cholet closed 1 year ago

Philippe-Cholet commented 1 year ago

As I intend to improve the permutations algorithm (related to #722), I first wanted to simplify a few things:

Then I encountered a TODO that I fixed: It makes sense to have a LazyBuffer::size_hint method which greatly helps in fixing Permutations::size_hint and therefore correctly allocate the memory for codes like (0..n).permutations(k).map(...).collect_vec(). I think we need to be able to apply a function to both bounds of a SizeHint. Here, it avoids repetition and therefore readability. Then permutations' size hints are tested.

Philippe-Cholet commented 1 year ago

Size hints for combinations_with_replacement. EDIT: Same for combinations.

Philippe-Cholet commented 1 year ago

And count methods for combinations[_with_replacement] and powerset.

phimuemue commented 1 year ago

Thanks @Philippe-Cholet, that's quite a bit.

I started reviewing the first commit, and if we include this, could we first focus on LazyBuffer: It has a field done which looks a lot like it just fuses the iterator. If that's the case, we should get rid of it and replace the iterator by Fuse<I>. I do not want to entrench ourselves in using and maintaining the flag, and I see that your size_hint would rely on it.

Please let me know if I'm wrong here.

Philippe-Cholet commented 1 year ago

You're right and it's done (in a new commit), the iterator is fused. I only used it in my size_hint as it seemed the thing to do, but if it's fused, self.it.size_hint() will be (0, Some(0)) as it should. Small change in Combinations for powerset but no issue for it to be fused.

EDIT: Yes, that's quite a bit, I started small and each time I saw another related thing to do. CompleteStateRemaining replaced by Option<usize> lead to Permutations::size_hint and other "similar" size hints and then realized I accurately could count.

Philippe-Cholet commented 1 year ago

I just noticed that a few .fold(Some(...), function_that_add_or_multiply) could become .map(func).sum/product() (returning Option<usize>) once the MSRV (currently 1.36.0) is 1.37.0 or more.

phimuemue commented 1 year ago

Would you mind doing the lazy buffer in a separate PR? (So we could establish a clean base line.)

Philippe-Cholet commented 1 year ago

All depends on that but ok.

phimuemue commented 1 year ago

I don’t want to force you to do anything, but I see that the PRs are generally good, but broad. Having smaller PRs helps tremendously with reviewing, and I hope that we can merge them faster this way.

Philippe-Cholet commented 1 year ago

I'm willing to learn how to do things in any better way. The first commit here is big because of I previously started from an outdated master branch, realized that later on and because I was unsure what to do with git, I started again and it resulted in a big commit I did not have the heart to split. But I agree that my first commit is broad, I was a bit overzealous. EDIT: New PR #727

phimuemue commented 1 year ago

I'm willing to learn how to do things in any better way. The first commit here is big because of I previously started from an outdated master branch, realized that later on and because I was unsure what to do with git, I started again and it resulted in a big commit I did not have the heart to split. But I agree that my first commit is broad, I was a bit overzealous. EDIT: New PR #727

No worries! I really want to include your ideas, but do not have much time at hand, so I try to streamline the review process (admittedly: streamline it on my side) - reviewing (very) small-bite chunks separately is usually easier than looking reviewing the "sum of the chunks" at once.

Philippe-Cholet commented 1 year ago

Well I'm closing this, it became too broad but it will make 4 or 5 nice smaller PR. Let's consider this as a draft.