More optimal Agg.rankBy() and Agg.denseRankBy() implementations shouldn't keep buffers and sort them

lukaseder commented 8 years ago

The current implementations of Agg.rankBy() and Agg.denseRankBy() are very wasteful as they:

maintain an in-memory buffer of all the values
sort that buffer
run through that buffer to find the ranked index

There was a suggested pull request (https://github.com/jOOQ/jOOL/pull/250) to optimise the search for the index, but a much better approach was suggested by @amaembo (https://github.com/jOOQ/jOOL/pull/250#issuecomment-241474427): Just count the elements that are lower than the argument value

Xaerxess commented 8 years ago

Regarding percentRankBy - one can implement it just like rankBy but with second long storing count (() -> new long[] { -1L, 0L } and so on), or in terms of rankBy using other stuff from jOOL:

public static <T, U> Collector<T, ?, Optional<Double>> percentRankBy(U value, Function<? super T, ? extends U> function, Comparator<? super U> comparator) {
    return collectingAndThen(
            Tuple.collectors(
                    rankBy(value, function, comparator),
                    count()),
            t -> t.map((rank, count) -> rank.map(r -> (double) r / count))
    );
}

Which approach do you prefer?

BTW: Why are you using one-element array everywhere, why not implement mutable lazy holder class (mutable optional?) which would take care of some repeating boilerplate code? (Genuine question - I wonder if you thought about that; using l[0] looks like the hard way to me.)

lukaseder commented 8 years ago

Regarding percentRankBy - one can implement it just like rankBy but with second long storing count (() -> new long[] { -1L, 0L } and so on)

Excellent.

Which approach do you prefer?

The Tuple.collectors() approach. Nice thinking

BTW: Why are you using one-element array everywhere

I was inspired by the JDK, e.g.: https://github.com/dmlloyd/openjdk/blob/c9f3c7dda232dabcfe297486d8d2f341bcb1e901/jdk/src/java.base/share/classes/java/util/stream/Collectors.java#L622

Without having checked, this approach probably has the advantage of:

No indirection through abstraction, because by default, the JIT inlines only 9 method levels, if I remember correctly. With arrays, we're not wasting one level on something trivial.
Less object size overhead on pairs of long, and probably (didn't verify this) better access patterns

I'm sure when Valhalla reaches GA, all of these arrays will be replaced by actual value types.

I'm also sure @amaembo knows more about the rationale behind using long[] for storing intermediate data.

amaembo commented 8 years ago

Using class like class LongHolder {long value;} is not a big difference in performance compared to one-element array. Probably it's even more performant as arr[0] = xyz requires array bounds-check which is unlikely to be eliminated here by JIT (warning: this is just an educated guess, I never actually measured this). However using separate class would require a little more memory (to hold class metainformation) and will increase the size of the library jar a little.

If you access the field directly, (i.e. holder.value = xyz) stack depth is not increased. And in general the calls inside Stream API are usually too polymorphic to be inlined.

lukaseder commented 8 years ago

Very interesting, thanks for your feedback, @amaembo. I haven't thought of the array bounds-check. The separate class is probably a negligible cost, given that we already have cruft like ObjIntConsumer :)

Will think about this. Perhaps there's indeed room for improvement here

amaembo commented 8 years ago

I've just experimented replacing int[3] holder used in IntStreamEx.minByInt with explicit class with three fields. I observe statistical-significant time improvement, but it's only 0.4 ns per single processed stream element (about one CPU clock cycle). I think such improvement is too small, so I will not rewrite my code, at least for now.

lukaseder commented 8 years ago

@amaembo Thanks for the feedback. Very helpful

jOOQ / jOOL

More optimal Agg.rankBy() and Agg.denseRankBy() implementations shouldn't keep buffers and sort them #254