Closed lukaseder closed 8 years ago
Regarding percentRankBy
- one can implement it just like rankBy
but with second long storing count (() -> new long[] { -1L, 0L }
and so on), or in terms of rankBy
using other stuff from jOOL:
public static <T, U> Collector<T, ?, Optional<Double>> percentRankBy(U value, Function<? super T, ? extends U> function, Comparator<? super U> comparator) {
return collectingAndThen(
Tuple.collectors(
rankBy(value, function, comparator),
count()),
t -> t.map((rank, count) -> rank.map(r -> (double) r / count))
);
}
Which approach do you prefer?
BTW: Why are you using one-element array everywhere, why not implement mutable lazy holder class (mutable optional?) which would take care of some repeating boilerplate code? (Genuine question - I wonder if you thought about that; using l[0] looks like the hard way to me.)
Regarding percentRankBy - one can implement it just like rankBy but with second long storing count (() -> new long[] { -1L, 0L } and so on)
Excellent.
Which approach do you prefer?
The Tuple.collectors()
approach. Nice thinking
BTW: Why are you using one-element array everywhere
I was inspired by the JDK, e.g.: https://github.com/dmlloyd/openjdk/blob/c9f3c7dda232dabcfe297486d8d2f341bcb1e901/jdk/src/java.base/share/classes/java/util/stream/Collectors.java#L622
Without having checked, this approach probably has the advantage of:
long
, and probably (didn't verify this) better access patternsI'm sure when Valhalla reaches GA, all of these arrays will be replaced by actual value types.
I'm also sure @amaembo knows more about the rationale behind using long[]
for storing intermediate data.
Using class like class LongHolder {long value;}
is not a big difference in performance compared to one-element array. Probably it's even more performant as arr[0] = xyz
requires array bounds-check which is unlikely to be eliminated here by JIT (warning: this is just an educated guess, I never actually measured this). However using separate class would require a little more memory (to hold class metainformation) and will increase the size of the library jar a little.
If you access the field directly, (i.e. holder.value = xyz
) stack depth is not increased. And in general the calls inside Stream API are usually too polymorphic to be inlined.
Very interesting, thanks for your feedback, @amaembo. I haven't thought of the array bounds-check. The separate class is probably a negligible cost, given that we already have cruft like ObjIntConsumer
:)
Will think about this. Perhaps there's indeed room for improvement here
I've just experimented replacing int[3]
holder used in IntStreamEx.minByInt
with explicit class with three fields. I observe statistical-significant time improvement, but it's only 0.4 ns per single processed stream element (about one CPU clock cycle). I think such improvement is too small, so I will not rewrite my code, at least for now.
@amaembo Thanks for the feedback. Very helpful
The current implementations of
Agg.rankBy()
andAgg.denseRankBy()
are very wasteful as they:There was a suggested pull request (https://github.com/jOOQ/jOOL/pull/250) to optimise the search for the index, but a much better approach was suggested by @amaembo (https://github.com/jOOQ/jOOL/pull/250#issuecomment-241474427): Just count the elements that are lower than the argument value