Closed lukaseder closed 8 years ago
It is a special case of what Scala calls zipAll
:
interface Seq<T> {
// thisElem and thatElem used to fill up the result if this or that runs out of elements
<U> Seq<Tuple2<T, U>> zipAll(Seq<U> that, T thisElem, U thatElem);
}
with thisElem
and thatElem
set to null
.
Yep, true. Thanks for the hint.
I thought again about outer zips. In contrast to DB joins, zip is very limited. It composes records sequentially, one-by-one. The only match-condition the records meet is their position.
Outer joins handle the case where for a given record of one relation no record of the other relation matches the condition. zipAll handles both left-and right outer join simulaneously by providing default values. Therefore zipAll is not a special case of outer zips.
Well... In DB Speak, the following can be said:
-- Convenience form
A [ { LEFT | RIGHT | FULL } [ OUTER ] ] ZIP B
-- Full form
(SELECT A.*, ROW_NUMBER() OVER() AS RN FROM A) AS A
[ { LEFT | RIGHT | FULL } [ OUTER ] ] JOIN
(SELECT B.*, ROW_NUMBER() OVER() AS RN FROM B) AS B
ON A.RN = B.RN
zipAll()
is an alias for fullOuterZip()
Thank you. If we had functions
Seq<T> continue(T next);
Seq<T> continue(Supplier<? extends T> nextSupplier);
Seq<T> continue(Function<? super T, ? extends T> previousToNext);
then
// = leftOuter
Seq.of(1, 2, 3).zip(Seq.of("A", "B").continue(null));
// = rightOuter
Seq.of("A", "B", "C").zip(Seq.of(1, 2).continue(null)).map(t -> t.swap());
Hmm, good point (although, your rightOuter suggestion appears incorrect to me, the resulting stream should contain only ((A, 1), (B, 2))
). And continue(T)
is nothing but concat(Seq.generate(T))
.
your rightOuter suggestion appears incorrect to me, the resulting stream should contain only ((A, 1), (B, 2)))
should be right, the Tuple2 is swapped (couldn't come up with a better solution yet)
And continue(T) is nothing but concat(Seq.generate(T))
generate(T)
is a special case of continue(T)
: empty().continue(T)
That would be my understanding of right outer zip:
// = rightOuter (no effect)
Seq.of("A", "B", "C").continue(null).zip(Seq.of(1, 2));
// = rightOuter (with effect)
Seq.of("A", "B", "C").continue(null).zip(Seq.of(1, 2, 3, 4));
generate(T) is a special case of continue(T): empty().continue(T)
Eh, are we talking about the same thing? :) Here's my naive continue(T)
implementation:
interface Seq<T> {
default Seq<T> continue(T t) {
return concat(Seq.generate(t));
}
}
Or, what am I missing?
I took your example from above:
// ((1, "A"), (2, "B"), (null, "C"))
Seq.of(1, 2).rightOuterZip(Seq.of("A", "B", "C"));
The continue(null)
should happen in the zip method argument, e.g. finite.zip(that.continue(null))
. Otherwise we do not terminate, e.g. infinite.zip(that)
.
Therefore I needed to change your example like this:
Seq.of("A", "B", "C")
// = ("A", 1), ("B", 2), ("C", null)
.zip(Seq.of(1, 2).continue(null))
// = (1, "A"), (2, "B"), (null, "C")
.map(t -> t.swap());
Eh, are we talking about the same thing? :) Here's my naive continue(T) implementation:
yes, you're right - I've overseen that.
Why not simply
default <U> Seq<Tuple2<T, U>> rightOuterZip(Seq<? extends U> right) {
return continue(null).zip(right);
}
No swapping needed...
Oh, right - my fault. Was a little bit undercoffeeinated... :-)
Oh noes - continue is a keyword...
Alternatives (see https://www.powerthesaurus.org/continue/synonyms):
Well, I personally don't think that continue()
is a first-level concept. It can be really trivially be achieved via concat(generate(...))
:)
Yes, you are right I think. API shouldn't be redundant.
@lukaseder, @danieldietrich, I thought a bit about how zipAll
could look like and inspired heavily by scala's zipAll
I ended up with sth like this:
static <T1, T2, R> Seq<R> zipAll(Seq<T1> s1, Seq<T2> s2, BiFunction<? super T1, ? super T2, ? extends R> zipper, T1 thisElem, T2 thatElem) {
final Iterator<T1> it1 = s1.iterator();
final Iterator<T2> it2 = s2.iterator();
class ZipAll implements Iterator<R> {
@Override
public boolean hasNext() {
return it1.hasNext() || it2.hasNext();
}
@Override
public R next() {
if (it1.hasNext()) {
if (it2.hasNext()) {
return zipper.apply(it1.next(), it2.next());
} else {
return zipper.apply(it1.next(), thatElem);
}
} else {
if (it2.hasNext()) {
return zipper.apply(thisElem, it2.next());
} else {
throw new NoSuchElementException("next on empty iterator");
}
}
}
}
return seq(new ZipAll());
}
What do you think ?
Yes, it is a special method for this, how I would express it:
s1.zipAll(s2, thisElem, thatElem).map(tuple -> tuple.transform(zipper))
but without creating intermediate Tuples which relaxes GC.
Implementation-wise that's certainly one valid option. Do you also see a way to do this via Spliterator
?
Also, given that the zipper
is an optional convenience argument, I'd put it at the end and overload:
static <T1, T2> Seq<Tuple2<T1, T2>> zipAll(Seq<T1> s1, Seq<T2> s2, T1 default1, T2 default2) {
return zipAll(s1, s2, default1, default2, (t1, t2) -> tuple(t1, t2));
}
static <T1, T2, R> Seq<R> zipAll(Seq<T1> s1, Seq<T2> s2, T1 default1, T2 default2, BiFunction<? super T1, ? super T2, ? extends R> zipper) {}
@lukaseder agree regarding the zipper
as the version which I posted was a draft. I'll refine it in the evening, once I get to the keyboard :)
Looking forward! :)
In case you do want to look into using a Spliterator
, you might want to consider delegating to SeqUtils.transform()
, which makes it a bit easier.
@lukaseder I checked SeqUtils.transform()
and how it's used in other methods in Seq
, but since the the method is processing two Stream
s (Seq
s) I decided to stay with using Iterator
- but I'm quite curious how a corresponding Spliterator
implementation would look like.
BTW, I have found a very interesting info, that at some point, Stream.zip
was part of (very early) jdk8:
http://download.java.net/lambda/b93/docs/api/java/util/stream/Streams.html#zip(java.util.stream.Stream, java.util.stream.Stream, java.util.function.BiFunction)
Here is the PR (with a nice, round number): https://github.com/jOOQ/jOOL/pull/200
You're right. SeqUtils.transform()
won't work here. Interesting info indeed. I wonder why it was removed! Probably some caveat with respect to parallelisation. Also, this class was called Streams
, which would be the expected name. Now, we have the weird StreamSupport
. Who knows what happend
Seems that the main reason for removal was the API blow up for primitive specializations. You would need
zip(IntStream, IntStream, IntBinaryOperator);
zip(LongStream, LongStream, LongBinaryOperator);
zip(DoubleStream, DoubleStream, DoubleBinaryOperator);
But that's only beginning. What about
<T, R> zip(IntStream, Stream<T>, IntObjToObjFunction<T, R>);
zip(IntStream, LongStream, IntLongToWhateverFunction);
etc, etc. The performance is one of the main goals of the Stream API (though not very important for jOOL =)). So it was decided to remove zip
at all than to ignore primitive specializations. I can guess, that we will see zip
in JDK only when project valhalla will be ready.
The Streams
class was created when interface static methods were not implemented yet. Most of it migrated into Stream/IntStream/LongStream/DoubleStream
later.
That's one possible reason. Although, one might argue that this whole specialisation was just a bit premature, because hopefully, Valhalla will get rid of all these types again :)
@lukaseder, @amaembo openjdk mail archive explains quite a lot:
http://mail.openjdk.java.net/pipermail/lambda-libs-spec-observers/2013-June/002028.html
OK, done. Thanks again for the contribution!
The current
Seq.zip()
method "inner zips" two streams, such that the size of the resulting stream is the minimum of the size of each zipped stream.Sometimes, it is useful, however, to fully zip all streams, providing
Optional
values (or nulls) once the shorter stream is depleted. Example: