jOOQ / jOOL

jOOλ - The Missing Parts in Java 8 jOOλ improves the JDK libraries in areas where the Expert Group's focus was elsewhere. It adds tuple support, function support, and a lot of additional functionality around sequential Streams. The JDK 8's main efforts (default methods, lambdas, and the Stream API) were focused around maintaining backwards compatibility and implementing a functional API for parallelism.
http://www.jooq.org/products
Apache License 2.0
2.09k stars 168 forks source link

Add Seq.lazy() #306

Closed tlinkowski closed 4 years ago

tlinkowski commented 7 years ago

This is my second attempt to propose Seq.lazy() (see #302 for details).

Here, I propose it together with the reimplementations of the following methods mentioned in #195 (in order to demonstrate the real usefulness of Seq.lazy):

I did not reimplement the following methods (although they eagerly call iterator() or spliterator(), they do not try to iterate it immediately so they don't seem to be problematic in practice):

Reimplementation of the remaining methods mentioned in #195 (i.e. all kinds of joins) was proposed in #305.

tlinkowski commented 7 years ago

Ha! I can't believe it but I've just realized (thanks to @lukaseder's link to certain StackOverflow issue which directed me to this issue: http://stackoverflow.com/questions/29229373/why-filter-after-flatmap-is-not-completely-lazy-in-java-streams) that Stream.flatMap is only a partially lazy operation.

This means that the implementation of Seq.lazy provided by me is extremely poor, and needs to be improved.

lukaseder commented 7 years ago

Thanks again for your suggestion. I won't rename-and-close it this time, promised. :)

However, I'm still not convinced that this feature pulls its weight to be published in the public API. As we move on, improving jOOλ, we might add this feature as an internal optimisation, but perhaps, SeqBuffer is already good enough?

tlinkowski commented 7 years ago

Well, I'm quite convinced it's useful internally (as SeqUtils.lazy, or under any other name) as I hope I have demonstrated with shuffle, reverse, removeAll, etc. SeqBuffer wouldn't be a good solution in these cases because it buffers the elements, and it's redundant there.

But well, maybe you're right that it's not important enough to be placed in the public API.

Relating to our discussion on SeqBuffer, I agree with you that all Seq operations should be lazy. However, Seq construction is not strictly speaking an operation on Seq because there is no Seq yet. And Seq construction is non-lazy in the following cases:

The lazy methods you added (Seq.seq(Supplier)) are quite helpful but I'd simply like to be able to construct a Seq.of(1, 2, 3, 4, 5) without preallocating the {1, 2, 3, 4, 5} array until the first element is requested.

So my last shot - maybe the method should simply be named differently, e.g. Seq.supply/Seq.provide? If we chose any of these names, we could also create Seq.onEmptySupply/Seq.onEmptyProvide instead of Seq.onEmptySwitch [#179], and we'd have a nice correspondence there :)

lukaseder commented 7 years ago

as SeqUtils.lazy, or under any other name

Yes, I agree. That would be much better for now. See, I don't oppose putting something really useful in the public API eventually, and I do agree that your suggested improvements of shuffle, etc. will help here. I'm just reluctant of opening something up to the public, because we cannot remove it anymore, afterwards.

So my last shot - maybe the method should simply be named differently, e.g. Seq.supply/Seq.provide?

I'm still not convinced of the utility to the public API of the concept, regardless of the name. If we do add something like this to the public API, we'll also need tons of overloads and these need to be well-designed.

I do agree with your improvement for now, if we can keep it internal.

tlinkowski commented 7 years ago

I want to say that I really appreciate your care for not adding anything to the public API without enough consideration :) To me, well-though API is one of the main strengths of jOOλ.

Yet, responding to your words that "we cannot remove it anymore", I hope you mean that we cannot remove it anytime soon, and that a policy like Guava's (deprecate in one major release, remove in the following major release) can be taken into consideration here :)

That said, I propose the following: 1) I'll remove the public Seq.lazy from this branch so you could merge this as an internal improvement 2) I'll create an issue proposing Seq.supply (I really like this name because it plays so nice with Supplier) where we can wait for potential feedback from other users, and then discuss it thoroughly

lukaseder commented 7 years ago

Yet, responding to your words that "we cannot remove it anymore", I hope you mean that we cannot remove it anytime soon, and that a policy like Guava's (deprecate in one major release, remove in the following major release) can be taken into consideration here :)

Sure. Infinity is aproximately 2-3 years. :) But why deprecate/remove if we can simply not add.

  1. I'll remove the public Seq.lazy from this branch so you could merge this as an internal improvement

Yes.

  1. I'll create an issue proposing Seq.supply (I really like this name because it plays so nice with Supplier) where we can wait for potential feedback from other users, and then discuss it thoroughly

If you insist :)

tlinkowski commented 7 years ago

Sure. Infinity is aproximately 2-3 years. :) But why deprecate/remove if we can simply not add.

Sure, deprecation and removal are the last resort - it's much better not to add if in doubt :)

I'll create an issue proposing Seq.supply [...]

If you insist :)

Well, maybe I miss something but this non-laziness during constructing a Seq strikes me as a serious drawback, and I kind of hope that there are people out there who share my concerns.

So I'll exploit your courtesy in this matter ;)

lukaseder commented 7 years ago

Well, maybe I miss something but this non-laziness during constructing a Seq strikes me as a serious drawback

I don't see this as being serious. Array allocation is relatively cheap (I'd even bet on escape analysis to kick in, perhaps), compared to the bad algorithmic complexity we still have in many other method implementations.

Besides, laziness has its "caveats". Consider this:

Seq<Integer> mymethod(int a, int b, int c) {
    if (someCheck)
        return Seq.supply(() -> Seq.of(a, b, c));
    else
        return Seq.empty();
}

Now, your lambda is capturing scope. This scope might be a heavy class with heavy state that is now referenced by this supply method.

It's just an example. Perhaps, you can show me some really serious trouble when allocating this array...

tlinkowski commented 7 years ago

Well, I wouldn't like to use it to defer array initialization (which, as you mentioned, is cheap).

Instead, I had two use cases in mind:

1) Performing stateful intermediate operations similar to sorted, reverse, etc., i.e. operations where you have to consume entire stream before returning the first element. In such case, I think it's better to capture the scope (increased memory consumption) than to precalculate the result (increased processor consumption).

2) Returning a Seq containing a result of an expensive computation:

return Seq.supply(() -> Seq.seq(computeExpensiveOptional(param)));

or several expensive computations:

return Seq.supply(() -> Seq.seq(computeExpensive1(a), computeExpensive2(b)));

However, I've just realised that in the second example you'd still unnecessarily precalculate the second element so Seq.supply is not a good choice in this case.

As for the first example, I now think it'd be much better to use a method like this:

static <T> Seq<T> seq(Supplier<Optional<? extends T>> s);

which we obviously can't add under this name because of type erasure (and because overloading methods taking lambdas is generally a bad idea).

But we could add it under a different name - maybe something like supplyOptional? (I know, I fixated on this supply prefix)

return Seq.supplyOptional(() -> computeExpensiveOptional(param));
lukaseder commented 7 years ago

(Just a short note, I haven't forgotten about the pending changes. Thanks for your patience :) )

lukaseder commented 4 years ago

The PR got closed by github due to a recent rename of the main branch.

I'm sorry, I don't want to re-iterate this