Closed phimuemue closed 11 months ago
I agree it's quite complex (but the states are clear thanks to the type system). It's probably because of this complexity that the size hint was incomplete until I fixed it.
I have a similar gut feeling, it's probably simplifiable into a single case since UnknownLen
states are only transition until we have CompleteState::Ongoing
(and Empty
might be easily ditched?).
The major difference though is the additional field cycles
, which is currently defined only once we know n
for sure.
Instead of first: bool
, maybe a more complex type to represent the transition (until we know n
). Well, I'm just thinking..
Do you want me to investigate changes further? EDIT: Well, I'm investigating.
I just wanted to take time to appreciate what's currently done. Each Iterator::next
step is advance+item
in the below state diagram (I wanted to do some nice mermaid diagram for quite some time now):
stateDiagram-v2
classDef rust color:orange
state k_eq_0 <<choice>>
k_eq_0: k == 0 ?
state enough <<choice>>
enough: Enough values ?
state full <<choice>>
full: pool.get_next() ?
state end <<choice>>
end: The end ?
[*]
[*] --> k_eq_0
k_eq_0 --> enough: No
k_eq_0 --> CompleteStart: Yes
enough --> [*]: No
enough --> StartUnknownLen: Yes
CompleteStart --> [*]: NO item
StartUnknownLen --> OngoingUnknownLen: advance
full --> OngoingUnknownLen: No
full --> CompleteStart: Yes, then advance X times...
OngoingUnknownLen --> full: advance
OngoingUnknownLen --> OngoingUnknownLen: item
CompleteStart --> CompleteOngoing: advance
CompleteOngoing --> CompleteOngoing: item
CompleteOngoing --> end: advance
end --> CompleteOngoing: No, mostly
end --> CompleteStart: Yes, a single time
class CompleteStart, StartUnknownLen, OngoingUnknownLen, CompleteOngoing rust
Once loaded, we are in CompleteOngoing
state until the end.
Now in https://github.com/rust-itertools/itertools/compare/master...Philippe-Cholet:itertools:permutations-states , I have
pub struct Permutations<I: Iterator> {
vals: LazyBuffer<I>,
state: PermutationState,
}
enum PermutationState {
Start { k: usize },
Ongoing { k: usize, min_n: usize },
Complete { indices: Vec<usize>, cycles: Vec<usize> }, // CompleteState have been merged
End,
}
Maybe we can simplify this state further. Option<PermutationState>
to get rid of End
? Maybe merge Ongoing
with another variant (but it would probably be less readable)?
The main commit has 80 additions and 143 deletions: nice simplification IMO.
https://github.com/rust-itertools/itertools/compare/master...Philippe-Cholet:itertools:permutations-states
I flattened PermutationState
variant by variant until it became a struct, which I merged into Permutations
leading to
pub struct Permutations<I: Iterator> {
indices: Option<Vec<usize>>, // is None when there is no permutation left
first: bool,
cycles: Option<Vec<usize>>, // is None while n is still unknown
vals: LazyBuffer<I>,
}
indices
starts with length k
; and once n
is known, cycles
is of length k
while indices
become of length n
.
I'm not sure it's better than a more verbose structure, but it's definitely possible.
EDIT: Previous https://github.com/rust-itertools/itertools/commit/c17761ad83a9a7f6e68a2c67a8b0c05862b0c62a and https://github.com/rust-itertools/itertools/commit/486b3483f36345b14aaeafad7ba6d964dba14216 about advance
function might be cherry-picked OUT because it is a bit slower (+10% on some huge test, but only in debug mode?!) with those changes. Apart from that, my changes do not speed things up or down.
@phimuemue Since you are around, I put a little remainder here.
Hi @Philippe-Cholet, I appreciate that you want to tackle the problem. I tried to review your commits, and I think most of it is really good.
Unfortunately I cannot point my finger at the exact places, but I think I've lost track of the possible states. It surely does not help that I'm not intricately familiar with the cycles
logic in there and that I do not have enough time to do a proper review of this.
As much as I'd love to answer with "yeah, I saw and understood this", I can't offer a more thorough review at the moment.
@phimuemue It's a huge rewrite with quite a few steps (as atomic as reasonable but each change is not straightforward).
Destroy the enum PermutationState
variant by variant is possible but it leads to having impossible but representable states (basically ignored with match patterns order and ..
ignoring stuff) and therefore lead to most of your (legitimate) questions. And after a big month, I have to understand some parts of the code again.
Having first: bool
was easy enough to keep in mind but indices
and cycles
possibly being none is complex to easily remember (and indices
length is k or n depending on having the lazy buffer is loading or filled).
I really prefer enum PermutationState
after the second commit: the Start { k }, Ongoing { k, min_n }, Complete { indices, cycles }, End
variants are way more explicit.
Below I quote one of my commit messages about None
uses and comment it as I think it was better represented with the enum:
indices being None means there is no permutation left. Even if first is true. // It was previously better represented by the
End
variant. cycles being None means the buffer is not full yet and n is still unknown. // It was previously better represented by theOngoing {k,min_n}
variant.
I think we should only keep the first 3 commits and simply discard the rest. The first and third are basic. The second that merges CompleteState
into PermutationsState
and hugely simplify next
after #748 as it merges the two steps there was before and discard panic!("unexpected iterator state")
.
Maybe I can simplify/split this second commit though.
Hi @Philippe-Cholet, sounds very reasonable.
During my review of #739, I found
Permutations
quite complex.As of now, it has four different states:
StartUnknownLen
,OngoingUnknownLen
,Complete
andEmpty
. Its cousinCombinations
, on the other hand, only has fieldsfirst
,pool
,indices
.My gut feeling is that
Permutations
could be massaged into a simpler form that more resembles the one ofCombinations
.@Philippe-Cholet Since you studied that code, can I have your two cents on this?