Closed tlinkowski closed 4 years ago
Ha! I can't believe it but I've just realized (thanks to @lukaseder's link to certain StackOverflow issue which directed me to this issue: http://stackoverflow.com/questions/29229373/why-filter-after-flatmap-is-not-completely-lazy-in-java-streams) that Stream.flatMap
is only a partially lazy operation.
This means that the implementation of Seq.lazy
provided by me is extremely poor, and needs to be improved.
Thanks again for your suggestion. I won't rename-and-close it this time, promised. :)
However, I'm still not convinced that this feature pulls its weight to be published in the public API. As we move on, improving jOOλ, we might add this feature as an internal optimisation, but perhaps, SeqBuffer
is already good enough?
Well, I'm quite convinced it's useful internally (as SeqUtils.lazy
, or under any other name) as I hope I have demonstrated with shuffle
, reverse
, removeAll
, etc. SeqBuffer
wouldn't be a good solution in these cases because it buffers the elements, and it's redundant there.
But well, maybe you're right that it's not important enough to be placed in the public API.
Relating to our discussion on SeqBuffer
, I agree with you that all Seq
operations should be lazy. However, Seq
construction is not strictly speaking an operation on Seq
because there is no Seq
yet. And Seq
construction is non-lazy in the following cases:
Seq.of(T)
and Seq.of(T...)
Seq.seq(Optional)
and Seq.seq(Optional...)
Seq.iterate
(with respect to the T seed
instead of Supplier<T> seedSupplier
)The lazy methods you added (Seq.seq(Supplier)
) are quite helpful but I'd simply like to be able to construct a Seq.of(1, 2, 3, 4, 5)
without preallocating the {1, 2, 3, 4, 5}
array until the first element is requested.
So my last shot - maybe the method should simply be named differently, e.g. Seq.supply
/Seq.provide
? If we chose any of these names, we could also create Seq.onEmptySupply
/Seq.onEmptyProvide
instead of Seq.onEmptySwitch
[#179], and we'd have a nice correspondence there :)
as SeqUtils.lazy, or under any other name
Yes, I agree. That would be much better for now. See, I don't oppose putting something really useful in the public API eventually, and I do agree that your suggested improvements of shuffle
, etc. will help here. I'm just reluctant of opening something up to the public, because we cannot remove it anymore, afterwards.
So my last shot - maybe the method should simply be named differently, e.g. Seq.supply/Seq.provide?
I'm still not convinced of the utility to the public API of the concept, regardless of the name. If we do add something like this to the public API, we'll also need tons of overloads and these need to be well-designed.
I do agree with your improvement for now, if we can keep it internal.
I want to say that I really appreciate your care for not adding anything to the public API without enough consideration :) To me, well-though API is one of the main strengths of jOOλ.
Yet, responding to your words that "we cannot remove it anymore", I hope you mean that we cannot remove it anytime soon, and that a policy like Guava's (deprecate in one major release, remove in the following major release) can be taken into consideration here :)
That said, I propose the following:
1) I'll remove the public Seq.lazy
from this branch so you could merge this as an internal improvement
2) I'll create an issue proposing Seq.supply
(I really like this name because it plays so nice with Supplier
) where we can wait for potential feedback from other users, and then discuss it thoroughly
Yet, responding to your words that "we cannot remove it anymore", I hope you mean that we cannot remove it anytime soon, and that a policy like Guava's (deprecate in one major release, remove in the following major release) can be taken into consideration here :)
Sure. Infinity is aproximately 2-3 years. :) But why deprecate/remove if we can simply not add.
- I'll remove the public Seq.lazy from this branch so you could merge this as an internal improvement
Yes.
- I'll create an issue proposing Seq.supply (I really like this name because it plays so nice with Supplier) where we can wait for potential feedback from other users, and then discuss it thoroughly
If you insist :)
Sure. Infinity is aproximately 2-3 years. :) But why deprecate/remove if we can simply not add.
Sure, deprecation and removal are the last resort - it's much better not to add if in doubt :)
I'll create an issue proposing
Seq.supply
[...]If you insist :)
Well, maybe I miss something but this non-laziness during constructing a Seq
strikes me as a serious drawback, and I kind of hope that there are people out there who share my concerns.
So I'll exploit your courtesy in this matter ;)
Well, maybe I miss something but this non-laziness during constructing a Seq strikes me as a serious drawback
I don't see this as being serious. Array allocation is relatively cheap (I'd even bet on escape analysis to kick in, perhaps), compared to the bad algorithmic complexity we still have in many other method implementations.
Besides, laziness has its "caveats". Consider this:
Seq<Integer> mymethod(int a, int b, int c) {
if (someCheck)
return Seq.supply(() -> Seq.of(a, b, c));
else
return Seq.empty();
}
Now, your lambda is capturing scope. This scope might be a heavy class with heavy state that is now referenced by this supply
method.
It's just an example. Perhaps, you can show me some really serious trouble when allocating this array...
Well, I wouldn't like to use it to defer array initialization (which, as you mentioned, is cheap).
Instead, I had two use cases in mind:
1) Performing stateful intermediate operations similar to sorted
, reverse
, etc., i.e. operations where you have to consume entire stream before returning the first element. In such case, I think it's better to capture the scope (increased memory consumption) than to precalculate the result (increased processor consumption).
2) Returning a Seq
containing a result of an expensive computation:
return Seq.supply(() -> Seq.seq(computeExpensiveOptional(param)));
or several expensive computations:
return Seq.supply(() -> Seq.seq(computeExpensive1(a), computeExpensive2(b)));
However, I've just realised that in the second example you'd still unnecessarily precalculate the second element so Seq.supply
is not a good choice in this case.
As for the first example, I now think it'd be much better to use a method like this:
static <T> Seq<T> seq(Supplier<Optional<? extends T>> s);
which we obviously can't add under this name because of type erasure (and because overloading methods taking lambdas is generally a bad idea).
But we could add it under a different name - maybe something like supplyOptional
? (I know, I fixated on this supply
prefix)
return Seq.supplyOptional(() -> computeExpensiveOptional(param));
(Just a short note, I haven't forgotten about the pending changes. Thanks for your patience :) )
The PR got closed by github due to a recent rename of the main branch.
I'm sorry, I don't want to re-iterate this
This is my second attempt to propose
Seq.lazy()
(see #302 for details).Here, I propose it together with the reimplementations of the following methods mentioned in #195 (in order to demonstrate the real usefulness of
Seq.lazy
):shuffle()
(provided simpler implementation)reverse()
removeAll()
retainAll()
window()
(this has to be adapted by @lukaseder injOOQ-tools
)I did not reimplement the following methods (although they eagerly call
iterator()
orspliterator()
, they do not try to iterate it immediately so they don't seem to be problematic in practice):zip()
,zipAll()
SeqUtils.transform()
grouped
partition
(returnsTuple
- could not useSeq.lazy
)duplicate()
(returnsTuple
- in this particular case I'd implement it usingSeqBuffer
proposed in #305)Reimplementation of the remaining methods mentioned in #195 (i.e. all kinds of
join
s) was proposed in #305.