Closed phimuemue closed 1 year ago
I experimented a bit and what you want is for MergeJoinBy<I, J, F>
to have:
impl Iterator<Item = EitherOrBoth<I::Item, J::Item>>
when the cmp_fn
is returning Ordering
;impl Iterator<Item = Either<I::Item, J::Item>>
when the cmp_fn
is returning bool
.It's conflicting. But then if we add a type T
to "MergeJoinBy" (I think it's a small breaking change):
pub struct MergeJoinBy<I: Iterator, J: Iterator, F, T>
where F: FnMut(&I::Item, &J::Item) -> T
{ ... }
then we can have
impl<I, J, F> Iterator for MergeJoinBy<I, J, F, Ordering> // EitherOrBoth
impl<I, J, F> Iterator for MergeJoinBy<I, J, F, bool> // Either
Does it seems good to you? (changes in "src/merge_join.rs" and very little in "src/lib.rs"). EDIT: I can avoid duplication with a macro. EDIT 2: A bit unsure about what variant it should return in our new case: cmp_fn returning | output variant |
---|---|
Ordering::Less | EitherOrBoth::Left |
Ordering::Greater | EitherOrBoth::Right |
Ordering::Equal | EitherOrBoth::Both |
false | Either::Right ? |
true | Either::Left ? |
Hi there. I'm not sure if I follow 100% wrt "conflicting". I think MergeJoinBy
should not require the function's return type explicitly, because it can be inferred from F
.
As such, I'd imagine this:
impl<I, J, F, ResultOfF> Iterator for MergeJoinBy
where
F: FnMut(&I::Item, &J::Item) -> ResultOfF,
ResultOfF: OrderingOrBool<I::Item, J::Item>
{
type Item = ResultOfF::Item;
}
trait OrderingOrBool<IItem, JItem> {
type Item
}
impl<IItem, JItem> OrderingOrBool<IItem, JItem> for std::cmp::Ordering {
type Item = EitherOrBoth<IItem, JItem>;
}
// similar for bool with Item = Either
I am not sure if this design hits an unsovable obstacle somewhere, but for me it looks a bit clearer than basically copy-pasting/macro-generating two impls for bool
and Ordering
separately. Did you research this possibility and run into problems?
I agree about not conflicting for the reason you mention, I was just being careful. TBH, I did not think of this trait-way, I like it. After giving it a try, it works with a few additions.
pub trait OrderingOrBool<I, J> {
type Item;
// All with `#[inline(always)]` in both impl
fn into_cmp(self) -> Ordering;
fn left(left: I) -> Self::Item;
fn right(right: J) -> Self::Item;
fn both(left: I, right: J) -> Self::Item;
}
To match against the result of the user function, I needed into_cmp
(for bool, if self { Ordering::Less } else { Ordering::Greater }
).
I added "left/right/both" methods to be able to create variants ("both" for bool only contains unreachable!("into_cmp never returns Ordering::Equal so this should never be called")
).
Wow, that was fast. - Nice that it seems to work.
As for the additional methods: Of course, I didn't see the whole implementation, but it sounds reasonable.
Here is a commit for it. Seems reasonable to me too.
I'm currently blocked about merging implementations of MergeJoinBy
and MergeBy
. Maybe you have an idea on what to do. Or I will try again.
First, I moved MergeBy
into its own module "merge" as you suggested.
To merge implementations of "MergeBy" and "MergeJoinBy", I mostly struggled with the type "Merge" (how to discard "MergePredicate" and "MergeLte" for I::Item::le
).
Here, type Merge<I, J> = MergeBy<I, J, MergeLte>;
becomes type Merge<I, J> = MergeBy<I, J, fn(&<I as Iterator>::Item, &<J as Iterator>::Item) -> bool>;
.
Then trait MergePredicate<T>
and struct MergeLte
can be removed.
Finally, MergeBy<I, J, F>
basically wraps MergeJoinBy<I, J, F, bool>
.
So in impl Iterator
, I use Either::into_inner
to convert Either<T, T>
to T
.
impl FusedIterator
while "MergeBy" did. Not sure what to do here.MergeJoinBy<I, J, F, bool>
and) MergeBy<I, J, F>
lost some size_hint
information because "EitherOrBoth" sometimes use one element of each iterator in one item (Both
variant) while "Either" does not. Unless I find a fix.#[inline]
methods in impl Iterator for MergeBy
?If this is satisfying, should I merge files "merge_join.rs" into the newly created "adaptors/merge.rs" ?
EDIT: In "OrderingOrBool" trait, I added fn size_hint(left: SizeHint, right: SizeHint) -> SizeHint;
to fix the small size_hint
issue mentionned above.
Thank you for staying in touch. I see there are some valid questions, so I suggest to first focus on supporting bool
(since that's probably what's needed anyway if we want to merge MergeBy
and MergeJoinBy
, i.e. let's focus on your first commit (https://github.com/Philippe-Cholet/itertools/commit/5ad6f70e46c0e2404f86c5eaeed800adec27d9cc).
Please see my comments there for specifics, but probably most important: Is the fourth type parameter in MergeJoinBy
really needed? (I would guess "no", because it could be inferred from F
.)
In addition, this commit introduced several trait bounds that look unneeded to me. We should add only the trait bounds required to support bool
and these that keep the code in line with what's already there.
The idea of your transformation (i.e. having left
, right
, both
as customization points) looks correct, but the unreachable
makes me think if there's a way to refactor this to get rid of it. Did you try something in this direction?
As said, let's keep it at supporting bool
for now, but here's some thoughts regarding your questions anyway:
"MergeJoinBy" don't have
impl FusedIterator
while "MergeBy" did. Not sure what to do here.
I think we should fuse the input iterators (i.e. adapt MergeJoinBy
's strategy and document this. (Sidenote: That's exactly the type of problem that would be avoided if the structs had been merged.)
(
MergeJoinBy<I, J, F, bool>
and)MergeBy<I, J, F>
lost somesize_hint
information because "EitherOrBoth" sometimes use one element of each iterator in one item (Both
variant) while "Either" does not. Unless I find a fix.
Good catch. Maybe that precludes merging the structs... We'll have to look into that.
Should I
#[inline]
methods inimpl Iterator for MergeBy
?
I think most of the time we rely on the compiler, but afaik there are no definitive guidelines about [inline]
.
You might have missed (or not) an edit of my comment, a fix about size_hint, seems simple enough to me.
Instead of creating a trait for Ordering/bool, maybe we can create a trait for FnMut(&I, &J) -> T
, I'll come back to you.
I did not find a way to have only three types. I was going with a trait MergePredicate (idea from how MergeBy
currently works)
pub trait MergePredicate<I, J> { // or <I, J, T>
type Item;
// fn merge_pred(&mut self, left: &I, right: &J) -> T; // return self(left, right)
fn merge(&mut self, left: I, right: J) -> (Option<I>, Option<J>, Self::Item);
// (item to put back left, same for right, next_item)
// then I can match against Ordering/bool variants, and don't need one unreachable both method.
fn left(left: I) -> Self::Item;
fn right(right: J) -> Self::Item;
}
With MergePredicate<I, J>
, I get an error about "conflicting implementations".
impl<I, J, F: FnMut(&I, &J) -> Ordering> MergePredicate<I, J> for F { ... }
impl<I, J, F: FnMut(&I, &J) -> bool> MergePredicate<I, J> for F { ... }
I suppose it worries that there is a type out there implementing both FnMut(&I, &J) -> Ordering
and FnMut(&I, &J) -> bool
. Seems impossible but it's an error.
With MergePredicate<I, J, T>
(T
being Ordering
or bool
), this problem vanishes but I get another, a weird one IMO about:
impl<I, J, F, T> Iterator for MergeJoinBy<I, J, F>
where I: Iterator,
J: Iterator,
F: MergePredicate<I::Item, J::Item, T>,
{ ... }
error[E0207]: the type parameter
T
is not constrained by the impl trait, self type, or predicates
The constraint on F
does not constrain T
at all apparently.
Hence I add T
as a fourth type to MergeJoinBy. Only to get that it's not used (add a PhantomData<T>
field leads to a compiler error in tests about "type annotation", even with constraint F: MergePredicate<I, J, T>
) so with F: FnMut(&I::Item, &J::Item) -> T
as a constraint on MergeJoinBy, then it works with MergePredicate<I, J, T>
being a super trait of FnMut(&I, &J) -> T
! Harsh right?!
So I think T
should be inferred from F
as you guessed but not by the current compiler or this trait way.
It solves the issue you have with unreachable. I'll make a commit after dinner for you to see.
Here is a new commit (in a new branch) with trait MergePredicate<I, J, T>: FnMut(&I, &J) -> T
(see previous comment). It better matches Ordering
& bool
variants. And merge_join_by
constraints seems nicer ?
If you don't want the previous OrderingOrBool or this MergePredicate, I'm open to suggestions.
Not a priority at this moment but either way, there is some code duplication about putting back right/left elements.
Following the last commit, add a PhantomData field on MergeJoinBy would allow me to remove F: FnMut(...) -> T
constraint on the struct definition, impl Clone and impl Debug of MergeJoinBy, all three that were added in the last commit, and MergePredicate does not need to be a supertrait.
So the new PhantomData commit but you might prefer to see the diff if these last two were squashed: comparison here (the changes on constraints are way more natural/minimalist).
Thanks for the update. I like how you were able to avoid the unreachable
. I would assume that the compiler is smart enough to inline fn merge
and avoid the temporary Option
s.
I still think it should be possible without T
in MergeJoinBy
- see this prototype: https://github.com/phimuemue/rust-itertools/commit/072fe00e666f3d6fa5616af3176e2ae6e294516c. (I consider it important to not change MergeJoinBy
's interface (if not strictly required), as it right now is already complex enough.)
Looking at this prototype, it seems that MergePredicate
is not even needed anymore. If you want to continue working on this, you can use the prototype as a starting point, and get rid of the rough edges.
So I got rid of MergePredicate there. The diff between before the issue and now, it feels right.
I could avoid code duplication (code below is in four Iterator methods), or not (it's not that bad).
if let Some(left) = left { self.left.put_back(left); }
if let Some(right) = right { self.right.put_back(right); }
I could work on bringing struct MergeBy
into src/merge_join.rs
or another place ?
I'd say we should focus on supporting bool
before moving stuff around, especially in light of https://github.com/rust-itertools/itertools/issues/702.
merge
never returns (Some(...), Some(...), ...)
, but the caller always checks both Option
s. On top of that, when we call merge
, this is always followed by put_back
s. As such, we could think about moving the put_back
into merge
- or possibly at least returning Option<Either<I::Item, J::Item>>
. IMO you do not have to do this, because would I hope the compiler is smart enough to get rid of all the checks, but please add a comment that we thought about this.merge_join_by
should only accept T
where T: OrderingOrBool
.assert
that the bool
-variant behaves the same as the Ordering
-variant as long as the Ordering
is not equal
. And we should assert
that the bool
-variant yields the same elements (modulo wrapping in Either
) as the simpler MergeBy
.cmp_fn
returns Ordering
"; similar for the other case. The bool
case should probably mention that this somewhat corresponds to cmp_fn
returning if the first argument is considered "less" than the second argument. I like your idea of re-using the example here - nice.Use MergeJoinBy
for the internal of MergeBy
to merge their implementation in a single file is quite easy anyway.
All four points are done here (and small stupid duplicates removed). Although, I'm not sure the documentation sentence about "less" is clear enough. I changed the common example to show that the Both
variant does not always split into consecutives Left, Right
variants (when one of the two iterators is not ascending, it's not a precondition after all).
I should not have unwrapped Either in the first quickcheck (commit).
I tried with put_back
s in merge
, and while its definition certainly would be not pretty, the rest would be straightforward (just let elem = T::merge(&mut self, left, right);
).
pub trait OrderingOrBool<L, R> { // L R for types, I J for iterators below
type MergeResult;
fn left(left: L) -> Self::MergeResult;
fn right(right: R) -> Self::MergeResult;
fn merge<I, J, F>(it: &mut MergeJoinBy<I, J, F>, left: L, right: R) -> Self::MergeResult
where I: Iterator<Item = L>,
J: Iterator<Item = R>,
F: FnMut(&L, &R) -> Self; // match/if (it.cmp_fn)(&left, &right) {...}
fn size_hint(left: SizeHint, right: SizeHint) -> SizeHint;
}
Hi, thanks so far. I realized we should keep the inners of OrderingOrBool
private (so that we can change them later). We could employ a similar technique to HomogenousTuple
which is a facade for TupleCollect
.
OrderingOrBool
is still private (we can't access it from outside itertools) because in "lib.rs", the module "merge_join" is not public, and OrderingOrBool
is not pub use
d anywhere else.
But HomogenousTuple
is public (I can use itertools::traits::HomogeneousTuple;
in a personal project).
Oh, cool. Maybe open a PR so we can have a last review?
Now that the PR is merged, we can merge implementations as you first wanted. The way I see it, trait MergePredicate<T>
and struct MergeLte
are useless (being respectively replaced by FnMut(&T, &T) -> bool
and PartialOrd::le
).
pub type Merge<I, J> = MergeBy<I, J, MergeLte>;
// becomes
pub type Merge<I, J> = MergeBy<I, J, fn(&<I as Iterator>::Item, &<J as Iterator>::Item) -> bool>;
Then
pub fn merge<I, J>(i: I, j: J) -> Merge<<I as IntoIterator>::IntoIter, <J as IntoIterator>::IntoIter>
where I: IntoIterator,
J: IntoIterator<Item = I::Item>,
I::Item: PartialOrd
{
// merge_by_new(i, j, MergeLte)
// becomes
merge_by_new(i, j, PartialOrd::le)
}
And MergeBy uses MergeJoinBy in its internal
pub struct MergeBy<I, J, F>
where I: Iterator,
J: Iterator<Item = I::Item>
{
iter: MergeJoinBy<I, J, F>,
}
The rest is really straightforward. I know it will pass tests as I experimented before. If that seems reasonable to you then my question is where do we move all the code, in which file? Or we move it a last step? Or as a first?
Hi @Philippe-Cholet , I think this design has two drawbacks:
fn
(note the lower case) buys some flexibility at the expense of possibly worsening runtime, as it is a function pointer. Thus, we should avoid it.iter
in MergeBy
would require that we implement all iterator methods and manually forward them to iter
. Instead, we could make MergeBy
a type alias for a particular MergeJoinBy
specialization.Without having looked at the code too closely, I think we should express the affected iterators all as instances of MergeJoinBy
, and customize MergeJoinBy
essentially by one parameter that
This could be realized by extending OrderingOrBool
.
I kinda expected it about fn
.
About iter
, I thought about it and I didn't think it was an issue because I don't think it would make sense to implement more than next/size_hint/count/last/nth
. I was more annoyed by Either::into_inner
decapsulation (see below, straightforward). I'm gonna try going your way.
impl<I, J, F> Iterator for MergeBy<I, J, F>
where I: Iterator,
J: Iterator<Item = I::Item>,
F: FnMut(&I::Item, &I::Item) -> bool,
{
type Item = I::Item;
fn next(&mut self) -> Option<Self::Item> {
self.iter.next().map(Either::into_inner)
}
fn size_hint(&self) -> SizeHint {
self.iter.size_hint()
}
fn count(self) -> usize {
self.iter.count()
}
fn last(self) -> Option<Self::Item> {
self.iter.last().map(Either::into_inner)
}
fn nth(&mut self, n: usize) -> Option<Self::Item> {
self.iter.nth(n).map(Either::into_inner)
}
}
After moving Merge/MergeBy into merge_join module to manipulate them all easily, I'm just really stuck.
To be able to keep MergeLte
and not use PartialOrd::le
(and therefore not a function pointer in Merge
type), we can't keep F: FnMut(&I::Item, &J::Item) -> T
in impl Iterator for MergeJoinBy
. But use something like F: MergePredicate<I::Item, J::Item, T>
instead does not constrain T
anymore leading to introduce T
back in MergeJoinBy<I, J, F, T>
with a phantom field. I don't see a way around this.
Both MergeBy
and MergeJoinBy
use functions returning bool
so in order for MergeBy
to be an instance of MergeJoinBy
, the only way I saw was adding another parameter (const SAME: bool
(meaning L == R ?
), a const one hence breaking the MSRV) to distinguish them. You can see the changes here. I'll revert them if you can't salvage any of this.
About F: MergePredicate<I::Item, J::Item, T>
not contraining T
, the answer was F: MergePredicate<I::Item, J::Item, Out = T>
. Then no phantom field. Commit here.
I think I'm finally on the right path!
To get rid of const SAME: bool
, I wrap the function F
in MergeFuncLR
(result wrapped EitherOrBoth/Either<L,R>
) and MergeFuncT
(result unwrapped T
).
Those MergeFuncLR/T
have a parameter T
(phantom field) to not have conflicting implementations of MergePredicate<L, R>
.
In merge_join_by
definition, in order for the compiler to keep guessing left/right types, I kept F: FnMut(&Self::Item, &J::Item) -> T
. But to not add F: MergePredicate<Self::Item, J::Item>
that felt duplicate, I removed T: OrderingOrBool...
, the user might lose OrderingOrBool
information.
Commit there but you might prefer to see the current code.
Hi there, thanks for going through this. I do not have too much time right now, so just quick feedback:
I think this is indeed taking the right turns! I skimmed the code, and I saw that you have four merge predicates:
merge
-> MergeJoinBy<..., MergeFuncT<MergeLte, bool>>
merge_by
-> MergeJoinBy<..., MergeFuncT<F, bool>>
merge_join_by(...bool)
-> MergeJoinBy<..., MergeFuncLR<F, bool>>
merge_join_by(...Ordering)
-> MergeJoinBy<..., MergeFuncLR<F, Ordering>>
Looks simple and good. Maybe we can simplify some MergeFuncT
/MergeFuncLR
internals (does MergeFuncT
need its second parameter?), but the scaffold so far seems right.
I think that MergeJoinBy
it should impl FusedIterator
, shouldn't it?
Thanks for the quick feedback.
I only commented out FusedIterator
temporarily, I definitely think MergeJoinBy
should implement it (commit) like many other iterators. After we solve the current issue, I'll probably make an issue (or use an old one if any) as I investigated it.
Good catch, MergeFuncT<F, bool>
does not need the second (phantom) parameter since T
always is bool
so MergeFuncT<F>
is enough. Therefore new commit in which I also simplify MergeFuncLR
and MergeFuncT
to "tuple structs".
Note 1: But MergeFuncLR
needs it to avoid conflicting implementations of MergePredicate<L, R>
.
impl<L, R, F: FnMut(&L, &R) -> Ordering> MergePredicate<L, R> for MergeFuncLR<F/*, Ordering*/> { ... }
impl<L, R, F: FnMut(&L, &R) -> bool> MergePredicate<L, R> for MergeFuncLR<F/*, bool*/> { ... }
Some impossible F
could implement both FnMut(&L, &R) -> Ordering
and FnMut(&L, &R) -> bool
. Annoying but unavoidable IMO.
Note 2: impl<...> MergePredicate<T, T> for MergeFuncT<MergeLte>
is mostly copy/paste of impl<...> MergePredicate<T, T> for MergeFuncT<F>
with self.0(&left, &right)
replaced by left <= right
.
EDIT: I could write a new fn merge_join_by_new<I: IntoIterator, J: IntoIterator, F>(left: I, right: J, f: F) -> MergeJoinBy<I::IntoIter, J::IntoIter, F>
to avoid MergeJoinBy { left: put_back(left.into_iter().fuse()), right: put_back(right.into_iter().fuse()), cmp_fn: ..., }
three times when only cmp_fn
changes.
From its trait documentation, we should implement FusedIterator
for all iterators that always return None
once they've done it once. But the real question IMO is on which conditions?!
Here, I think I made a mistake (being too fast). MergeJoinBy
fuses both left and right iterators even if they were not before. So I think we should
impl<I, J, F, T> FusedIterator for MergeJoinBy<I, J, F>
where I: Iterator,
J: Iterator,
F: MergePredicate<I::Item, J::Item, Out = T>,
{}
instead of
impl<I, J, F, T> FusedIterator for MergeJoinBy<I, J, F>
where I: FusedIterator,
// ~~~~~
J: FusedIterator,
// ~~~~~
F: MergePredicate<I::Item, J::Item, Out = T>,
{}
Sounds right.
FusedIterator' small fix. Currently: lib.rs, merge_join.rs, big diff.
In
merge_join_by
definition, in order for the compiler to keep guessing left/right types, I keptF: FnMut(&Self::Item, &J::Item) -> T
. But to not addF: MergePredicate<Self::Item, J::Item>
that felt duplicate, I removedT: OrderingOrBool...
, the user might lose "OrderingOrBool
information".
I could
pub trait IsOrderingOrBool {}
impl IsOrderingOrBool for Ordering {}
impl IsOrderingOrBool for bool {}
// then add to the function definition
T: merge_join::IsOrderingOrBool,
I think that would be clearer to people. What's your thought on that?
Then should I open a pull request for review, or do you see changes to do?
@phimuemue Little reminder.
Hi, thanks for keeping in touch.
What I see:
fn merge_join_by
to be MergeJoinBy<Self, J::IntoIter, F>
, and make MergeJoinBy
a type alias for some InternalMergeJoinBy
? (I'd like to avoid our auxiliary structs (such as MergeFunc
) appearing in public signatures.)MergePredicate
variants (instead of introducing another level of indirection for the Merge
/MergeBy
variants).FusedIterator
is correct imho.If all this passes our tests, can you open a PR?
Thank you too for this long discussion, I initially thought it would be quickier.
I don't like MergeFuncLR
to be public either, so I thought of this InternalMergeJoinBy
but what it means for MergeJoinBy
is a fourth type (pub type MergeJoinBy<I, J, F, T> = InternalMergeJoinBy<I, J, MergeFuncLR<F, T>>;
) which we previously tried to avoid. Because we can't write something like <F as FnMut(...)>::Output
to infer T
from F
(from what I understand because Fn*
traits aren't stabilized).
I don't see any way to avoid it, but maybe you do. Currently, Philippe-Cholet@c622606fa4c9d6ca85b485e6bd7cc8004a249e24 . I'll make a PR once you find a way or are ok with the 4th type.
I think I'm fine with InternalMergeJoinBy
- I agree it's a bit ugly, but the ugliness is not leaked too much to the user. And I hope that - once we find something better - we can clean up the internals.
It seems that
merge_join_by
/MergeJoinBy
is actually a more generic variant ofmerge
/MergeBy
.However, currently, the
MergeJoinBy
andMergeBy
are implemented independently (PutBack
vsPeekable
, fused behavior, requiringOrdering
vsbool
). The past showed that uniform implementations may be better.This change should not introduce breaking changes for users:
merge
,merge_by
andmerge_join_by
should stay usable as they are.However, I'd imagine that the general
merge_join_by
could also accept a comparator returningbool
(instead ofOrdering
) - this would free the users from having to deal with theEitherOrBoth::Both
case. I.e. I suggest that users should be able to write:When we tackle this, we should - as a first step - probably move all the
merge
-related functions into an own module.