jOOQ / jOOL

jOOλ - The Missing Parts in Java 8 jOOλ improves the JDK libraries in areas where the Expert Group's focus was elsewhere. It adds tuple support, function support, and a lot of additional functionality around sequential Streams. The JDK 8's main efforts (default methods, lambdas, and the Stream API) were focused around maintaining backwards compatibility and implementing a functional API for parallelism.
http://www.jooq.org/products
Apache License 2.0
2.09k stars 168 forks source link

More optimal Agg.rankBy() and Agg.denseRankBy() implementations shouldn't keep buffers and sort them #254

Closed lukaseder closed 8 years ago

lukaseder commented 8 years ago

The current implementations of Agg.rankBy() and Agg.denseRankBy() are very wasteful as they:

There was a suggested pull request (https://github.com/jOOQ/jOOL/pull/250) to optimise the search for the index, but a much better approach was suggested by @amaembo (https://github.com/jOOQ/jOOL/pull/250#issuecomment-241474427): Just count the elements that are lower than the argument value

Xaerxess commented 8 years ago

Regarding percentRankBy - one can implement it just like rankBy but with second long storing count (() -> new long[] { -1L, 0L } and so on), or in terms of rankBy using other stuff from jOOL:

public static <T, U> Collector<T, ?, Optional<Double>> percentRankBy(U value, Function<? super T, ? extends U> function, Comparator<? super U> comparator) {
    return collectingAndThen(
            Tuple.collectors(
                    rankBy(value, function, comparator),
                    count()),
            t -> t.map((rank, count) -> rank.map(r -> (double) r / count))
    );
}

Which approach do you prefer?

BTW: Why are you using one-element array everywhere, why not implement mutable lazy holder class (mutable optional?) which would take care of some repeating boilerplate code? (Genuine question - I wonder if you thought about that; using l[0] looks like the hard way to me.)

lukaseder commented 8 years ago

Regarding percentRankBy - one can implement it just like rankBy but with second long storing count (() -> new long[] { -1L, 0L } and so on)

Excellent.

Which approach do you prefer?

The Tuple.collectors() approach. Nice thinking

BTW: Why are you using one-element array everywhere

I was inspired by the JDK, e.g.: https://github.com/dmlloyd/openjdk/blob/c9f3c7dda232dabcfe297486d8d2f341bcb1e901/jdk/src/java.base/share/classes/java/util/stream/Collectors.java#L622

Without having checked, this approach probably has the advantage of:

I'm sure when Valhalla reaches GA, all of these arrays will be replaced by actual value types.

I'm also sure @amaembo knows more about the rationale behind using long[] for storing intermediate data.

amaembo commented 8 years ago

Using class like class LongHolder {long value;} is not a big difference in performance compared to one-element array. Probably it's even more performant as arr[0] = xyz requires array bounds-check which is unlikely to be eliminated here by JIT (warning: this is just an educated guess, I never actually measured this). However using separate class would require a little more memory (to hold class metainformation) and will increase the size of the library jar a little.

If you access the field directly, (i.e. holder.value = xyz) stack depth is not increased. And in general the calls inside Stream API are usually too polymorphic to be inlined.

lukaseder commented 8 years ago

Very interesting, thanks for your feedback, @amaembo. I haven't thought of the array bounds-check. The separate class is probably a negligible cost, given that we already have cruft like ObjIntConsumer :)

Will think about this. Perhaps there's indeed room for improvement here

amaembo commented 8 years ago

I've just experimented replacing int[3] holder used in IntStreamEx.minByInt with explicit class with three fields. I observe statistical-significant time improvement, but it's only 0.4 ns per single processed stream element (about one CPU clock cycle). I think such improvement is too small, so I will not rewrite my code, at least for now.

lukaseder commented 8 years ago

@amaembo Thanks for the feedback. Very helpful