rust-itertools / itertools

Extra iterator adaptors, iterator methods, free functions, and macros.
https://docs.rs/itertools/
Apache License 2.0
2.72k stars 309 forks source link

Permutations unnecessarily complex? #747

Closed phimuemue closed 11 months ago

phimuemue commented 1 year ago

During my review of #739, I found Permutations quite complex.

As of now, it has four different states: StartUnknownLen, OngoingUnknownLen, Complete and Empty. Its cousin Combinations, on the other hand, only has fields first, pool, indices.

My gut feeling is that Permutations could be massaged into a simpler form that more resembles the one of Combinations.

@Philippe-Cholet Since you studied that code, can I have your two cents on this?

Philippe-Cholet commented 1 year ago

I agree it's quite complex (but the states are clear thanks to the type system). It's probably because of this complexity that the size hint was incomplete until I fixed it. I have a similar gut feeling, it's probably simplifiable into a single case since UnknownLen states are only transition until we have CompleteState::Ongoing (and Empty might be easily ditched?). The major difference though is the additional field cycles, which is currently defined only once we know n for sure. Instead of first: bool, maybe a more complex type to represent the transition (until we know n). Well, I'm just thinking..

Do you want me to investigate changes further? EDIT: Well, I'm investigating.

Philippe-Cholet commented 1 year ago

I just wanted to take time to appreciate what's currently done. Each Iterator::next step is advance+item in the below state diagram (I wanted to do some nice mermaid diagram for quite some time now):

stateDiagram-v2
    classDef rust color:orange

    state k_eq_0 <<choice>>
    k_eq_0: k == 0 ?
    state enough <<choice>>
    enough: Enough values ?
    state full <<choice>>
    full: pool.get_next() ?
    state end <<choice>>
    end: The end ?

    [*]
    [*] --> k_eq_0
    k_eq_0 --> enough: No
    k_eq_0 --> CompleteStart: Yes
    enough --> [*]: No
    enough --> StartUnknownLen: Yes
    CompleteStart --> [*]: NO item
    StartUnknownLen --> OngoingUnknownLen: advance
    full --> OngoingUnknownLen: No
    full --> CompleteStart: Yes, then advance X times...
    OngoingUnknownLen --> full: advance
    OngoingUnknownLen --> OngoingUnknownLen: item
    CompleteStart --> CompleteOngoing: advance
    CompleteOngoing --> CompleteOngoing: item
    CompleteOngoing --> end: advance
    end --> CompleteOngoing: No, mostly
    end --> CompleteStart: Yes, a single time

    class CompleteStart, StartUnknownLen, OngoingUnknownLen, CompleteOngoing rust

Once loaded, we are in CompleteOngoing state until the end.

Philippe-Cholet commented 1 year ago

Now in https://github.com/rust-itertools/itertools/compare/master...Philippe-Cholet:itertools:permutations-states , I have

pub struct Permutations<I: Iterator> {
    vals: LazyBuffer<I>,
    state: PermutationState,
}
enum PermutationState {
    Start { k: usize },
    Ongoing { k: usize, min_n: usize },
    Complete { indices: Vec<usize>, cycles: Vec<usize> }, // CompleteState have been merged
    End,
}

Maybe we can simplify this state further. Option<PermutationState> to get rid of End? Maybe merge Ongoing with another variant (but it would probably be less readable)?

The main commit has 80 additions and 143 deletions: nice simplification IMO.

Updated mermaid diagram ```mermaid stateDiagram-v2 classDef rust color:orange state enough <> enough: Enough values ? state k0 <> k0: k == 0 ? state full <> full: pool.get_next() ? state end <> end: The end ? [*] --> enough enough --> [*]: No enough --> Start: Yes Start --> k0: next k0 --> [*]: Yes k0 --> Ongoing: No full --> Ongoing: No full --> Complete: Yes, then advance X times... Ongoing --> full: next Complete --> end: next end --> Complete: No end --> [*]: Yes class Start, Ongoing, Complete rust ```
Philippe-Cholet commented 1 year ago

https://github.com/rust-itertools/itertools/compare/master...Philippe-Cholet:itertools:permutations-states I flattened PermutationState variant by variant until it became a struct, which I merged into Permutations leading to

pub struct Permutations<I: Iterator> {
    indices: Option<Vec<usize>>, // is None when there is no permutation left
    first: bool,
    cycles: Option<Vec<usize>>, // is None while n is still unknown
    vals: LazyBuffer<I>,
}

indices starts with length k ; and once n is known, cycles is of length k while indices become of length n.

I'm not sure it's better than a more verbose structure, but it's definitely possible.

EDIT: Previous https://github.com/rust-itertools/itertools/commit/c17761ad83a9a7f6e68a2c67a8b0c05862b0c62a and https://github.com/rust-itertools/itertools/commit/486b3483f36345b14aaeafad7ba6d964dba14216 about advance function might be cherry-picked OUT because it is a bit slower (+10% on some huge test, but only in debug mode?!) with those changes. Apart from that, my changes do not speed things up or down.

Philippe-Cholet commented 11 months ago

@phimuemue Since you are around, I put a little remainder here.

phimuemue commented 11 months ago

Hi @Philippe-Cholet, I appreciate that you want to tackle the problem. I tried to review your commits, and I think most of it is really good.

Unfortunately I cannot point my finger at the exact places, but I think I've lost track of the possible states. It surely does not help that I'm not intricately familiar with the cycles logic in there and that I do not have enough time to do a proper review of this.

As much as I'd love to answer with "yeah, I saw and understood this", I can't offer a more thorough review at the moment.

Philippe-Cholet commented 11 months ago

@phimuemue It's a huge rewrite with quite a few steps (as atomic as reasonable but each change is not straightforward).

Destroy the enum PermutationState variant by variant is possible but it leads to having impossible but representable states (basically ignored with match patterns order and .. ignoring stuff) and therefore lead to most of your (legitimate) questions. And after a big month, I have to understand some parts of the code again. Having first: bool was easy enough to keep in mind but indices and cycles possibly being none is complex to easily remember (and indices length is k or n depending on having the lazy buffer is loading or filled).

I really prefer enum PermutationState after the second commit: the Start { k }, Ongoing { k, min_n }, Complete { indices, cycles }, End variants are way more explicit. Below I quote one of my commit messages about None uses and comment it as I think it was better represented with the enum:

indices being None means there is no permutation left. Even if first is true. // It was previously better represented by the End variant. cycles being None means the buffer is not full yet and n is still unknown. // It was previously better represented by the Ongoing {k,min_n} variant.


I think we should only keep the first 3 commits and simply discard the rest. The first and third are basic. The second that merges CompleteState into PermutationsState and hugely simplify next after #748 as it merges the two steps there was before and discard panic!("unexpected iterator state"). Maybe I can simplify/split this second commit though.

phimuemue commented 11 months ago

Hi @Philippe-Cholet, sounds very reasonable.