jOOQ / jOOL

jOOλ - The Missing Parts in Java 8 jOOλ improves the JDK libraries in areas where the Expert Group's focus was elsewhere. It adds tuple support, function support, and a lot of additional functionality around sequential Streams. The JDK 8's main efforts (default methods, lambdas, and the Stream API) were focused around maintaining backwards compatibility and implementing a functional API for parallelism.
http://www.jooq.org/products
Apache License 2.0
2.09k stars 168 forks source link

Tuple.collectors() should be able to share state #344

Open lukaseder opened 5 years ago

lukaseder commented 5 years ago

The current Tuple.collectors() implementation is a canonical one, where collectors are completely independent of one another. It would be better if they could share state in case they're related. For example, when calculating a set of percentiles:

var percentiles =
Stream.of(1, 2, 3, 4, 10, 9, 3, 3).collect(
  Tuple.collectors(
    Agg.<Integer>percentile(0.0),
    Agg.<Integer>percentile(0.25),
    Agg.<Integer>percentile(0.5),
    Agg.<Integer>percentile(0.75),
    Agg.<Integer>percentile(1.0)
  )
);

System.out.println(percentiles);

It would be great if the 5 collectors could somehow share their internal state, which is a sorted list of all values in the stream. Executing 1 sort instead of 5 is definitely preferrable.

This can be implemented in 2 ways:

rwperrott commented 3 years ago

I doubt that this can reliably be done by Tuple or Seq, because the source value must be the same (e.g. the same property in a container object, like a tuple.) which is dubious, and any mapper Function and Comparator reference must be the same too!

Any later compatible collectors would also need to allow injection of the dependant collector, and this trigger do-nothing values for all but the finaliser reference. The injected collector would need to provide an unmodifiable view of the sorted list of values.


I suggest that it would be much cheaper to provide a tuple of functions accepting the sorted (unmodifiable) list, resulting in a tuple of multiple results. Some of the results could even be non-percentile values, like sum, mean/average, count etc. Percentile functions could optionally be created with a PercentileFunction instance, I suggested for 380, to resolve between values.

Sixteen static methods, like the Tuple.collectors methods, would need to be provided for Tuple1 to Tuple16, possibly by the Tuple class.

rwperrott commented 3 years ago

This seems to work, with T and Tuple<T,U> methods.

TupleUtils:

import org.jooq.lambda.Seq;
import org.jooq.lambda.Unchecked;
import org.jooq.lambda.tuple.Tuple;

import java.lang.invoke.MethodHandle;
import java.lang.invoke.MethodHandles;
import java.lang.reflect.Method;
import java.util.Comparator;
import java.util.Objects;

public class TupleUtils {
    private static final MethodHandle[] TUPLE_FUNCTIONS;
    static {
        final MethodHandles.Lookup lookup = MethodHandles.lookup();
        TUPLE_FUNCTIONS = Seq.of(Tuple.class.getMethods())
                             .filter(m -> "tuple".equals(m.getName()))
                             .sorted(Comparator.comparing(Method::getParameterCount))
                             .map(Unchecked.function(lookup::unreflect))
                             .toArray(MethodHandle[]::new);
    }

    // Created because no Varargs Tuple creator method in Tuple class, which is annoying!
    @SuppressWarnings("unchecked")
    @SafeVarargs
    public static <T,R extends Tuple> R toTuple(T... values) {
        Objects.requireNonNull(values, "values");
        return (R) Unchecked.supplier(() -> TUPLE_FUNCTIONS[values.length]
                .invokeWithArguments((Object[])values)).get();
    }
}

AggFor :

import org.jooq.lambda.tuple.*;

import java.util.*;
import java.util.function.BiConsumer;
import java.util.function.Function;
import java.util.function.Supplier;
import java.util.stream.Collector;

@SuppressWarnings("unused")
public class AggFor {
    /**
     * Get a {@link Collector} that calculates the derived collectFor given a specific ordering and 2 summary
     * functions.
     */
    @SuppressWarnings("unchecked")
    public static <T, R1> Collector<T, ?, Tuple1<R1>>
    collectFor(Comparator<? super T> comparator,
               Function<List<T>, ? extends R1> sf1) {
        return summariesForUnsafe(comparator, sf1);
    }

    /**
     * A re-implementation of percentileBy as a function, based on my 380 example code
     */
    public static <T> Function<List<T>, Optional<T>>
    percentileFor(double percentile, PercentileFunction<T, T> percentileFunction) {
        if (percentile < 0.0 || percentile > 1.0)
            throw new IllegalArgumentException("Percentile must be between 0.0 and 1.0");

        return l -> {
            final int size = l.size();
            if (0 == size)
                return Optional.empty();
            if (1 == size || percentile == 0d)
                return Optional.of(l.get(0));
            if (percentile == 1d)
                return Optional.of(l.get(size - 1));

            // Limit fraction size, to stop common errors for double percentile values e.g. 2E-16.
            // 0.5d is added because actual percentile value can be between values.
            final double dIndex = ((double) Math.round(size * percentile * 1.0E6d) * 1.0E-6d) - 0.5d;
            int index = (int) dIndex; // floor, for before or exact index
            if (index >= size)
                return Optional.of(l.get(size - 1));

            final T t0 = l.get(index); // 1st before or exact value
            final double indexFraction = dIndex - index;
            // If end or exact index, return t0 value.
            if (++index == size || indexFraction == 0d)
                return Optional.of(t0);

            final T t1 = l.get(index); // after value
            // Only call percentile function if t*.v1 values are different.
            return Optional.of((t0.equals(t1))
                               ? t0
                               : percentileFunction.apply(t0, t1, indexFraction, Function.identity()));
        };
    }

    /**
     * Get a {@link Collector} that calculates the derived 2 function results, given a specific ordering.
     */
    @SuppressWarnings("unchecked")
    public static <T, R1, R2> Collector<T, ?, Tuple2<R1, R2>>
    collectFor(Comparator<T> comparator,
               Function<List<T>, ? extends R1> sf1,
               Function<List<T>, ? extends R2> sf2) {
        return summariesForUnsafe(comparator, sf1, sf2);
    }

    /**
     * Get a {@link Collector} that calculates the derived 3 function results given a specific ordering. functions.
     */
    @SuppressWarnings("unchecked")
    public static <T, R1, R2, R3> Collector<T, ?, Tuple3<R1, R2, R3>>
    collectFor(Comparator<T> comparator,
               Function<List<T>, ? extends R1> sf1,
               Function<List<T>, ? extends R2> sf2,
               Function<List<T>, ? extends R2> sf3) {
        return summariesForUnsafe(comparator, sf1, sf2, sf3);
    }

    /**
     * Get a {@link Collector} that calculates the derived 4 function results given a specific ordering.
     */
    @SuppressWarnings("unchecked")
    public static <T, R1, R2, R3, R4> Collector<T, ?, Tuple4<R1, R2, R3, R4>>
    collectFor(Comparator<T> comparator,
               Function<List<T>, ? extends R1> sf1,
               Function<List<T>, ? extends R2> sf2,
               Function<List<T>, ? extends R3> sf3,
               Function<List<T>, ? extends R4> sf4) {
        return summariesForUnsafe(comparator, sf1, sf2, sf3, sf4);
    }    // Methods up to Tuple16

    /**
     * Get a {@link Collector} that calculates the derived 5 function results given a specific ordering.
     */
    @SuppressWarnings("unchecked")
    public static <T, R1, R2, R3, R4, R5> Collector<T, ?, Tuple5<R1, R2, R3, R4, R5>>
    collectFor(Comparator<T> comparator,
               Function<List<T>, ? extends R1> sf1,
               Function<List<T>, ? extends R2> sf2,
               Function<List<T>, ? extends R3> sf3,
               Function<List<T>, ? extends R4> sf4,
               Function<List<T>, ? extends R5> sf5) {
        return summariesForUnsafe(comparator, sf1, sf2, sf3, sf4, sf5);
    }    // Methods up to Tuple16

    /**
     * Get a {@link Collector} that calculates the derived collectFor given a specific ordering and 6 summary
     * functions.
     */
    @SuppressWarnings("unchecked")
    public static <T, R1, R2, R3, R4, R5, R6> Collector<T, ?, Tuple6<R1, R2, R3, R4, R5, R6>>
    collectFor(Comparator<T> comparator,
               Function<List<T>, ? extends R1> sf1,
               Function<List<T>, ? extends R2> sf2,
               Function<List<T>, ? extends R3> sf3,
               Function<List<T>, ? extends R4> sf4,
               Function<List<T>, ? extends R5> sf5,
               Function<List<T>, ? extends R6> sf6) {
        return summariesForUnsafe(comparator, sf1, sf2, sf3, sf4, sf5, sf6);
    }    // Methods up to Tuple16

    /**
     * Get a {@link Collector} that calculates the derived collectFor given a specific ordering and 6 summary
     * functions.
     */
    @SuppressWarnings("unchecked")
    public static <T, R1, R2, R3, R4, R5, R6, R7> Collector<T, ?, Tuple7<R1, R2, R3, R4, R5, R6, R7>>
    collectFor(Comparator<T> comparator,
               Function<List<T>, ? extends R1> sf1,
               Function<List<T>, ? extends R2> sf2,
               Function<List<T>, ? extends R3> sf3,
               Function<List<T>, ? extends R4> sf4,
               Function<List<T>, ? extends R5> sf5,
               Function<List<T>, ? extends R6> sf6,
               Function<List<T>, ? extends R7> sf7) {
        return summariesForUnsafe(comparator, sf1, sf2, sf3, sf4, sf5, sf6, sf7);
    }    // Methods up to Tuple16

    public static <T, A, R> Function<List<T>, R>
    collectFor(final Collector<T, A, R> collector) {
        final Supplier<A> supplier = collector.supplier();
        final BiConsumer<A, T> accumulator = collector.accumulator();
        final Function<A, R> finisher = collector.finisher();
        return l -> {
            A a = supplier.get();
            for (var t : l)
                accumulator.accept(a, t);
            return finisher.apply(a);
        };
    }

    @SuppressWarnings({"unchecked", "EnhancedSwitchMigration"})
    private static <T, V extends Tuple> Collector<T, ?, V>
    summariesForUnsafe(Comparator<? super T> comparator,
                       Function<List<T>, ?>... summaryFunctions) {

        { // Validate summaryFunctions
            final int width = Objects.requireNonNull(summaryFunctions).length;
            if (0 == width)
                throw new IllegalArgumentException("no summaryFunctions");
            for (int i = 0; i < width; i++)
                if (null == summaryFunctions[i])
                    throw new IllegalArgumentException(String.format("summaryFunctions[%d] is null", i));
        }

        return Collector.of(
                (Supplier<ArrayList<T>>) ArrayList::new,
                ArrayList::add,
                (l1, l2) -> {
                    l1.addAll(l2);
                    return l1;
                },
                l -> {
                    final int size = l.size();

                    final List<T> x;
                    switch (size) {
                        case 0:
                            x = Collections.emptyList();
                            break;
                        case 1:
                            x = Collections.singletonList(l.get(0));
                            break;
                        default:
                            l.sort(comparator);
                            x = Collections.unmodifiableList(l);
                            break;
                    }

                    // Collect results in a transient array.
                    final int width = summaryFunctions.length;
                    final Object[] r = new Object[width];
                    for (int i = 0; i < width; i++)
                        r[i] = summaryFunctions[i].apply(x);

                    // Convert array to Tuple
                    return TupleUtils.toTuple(r);
                });
    }
}

AggBy Tuple<T,U>:

import org.jooq.lambda.tuple.*;

import java.util.*;
import java.util.function.BiFunction;
import java.util.function.Function;
import java.util.function.Supplier;
import java.util.stream.Collector;

import static org.jooq.lambda.tuple.Tuple.tuple;

/**
 * My better version of Agg for sum/avg operators.
 */
@SuppressWarnings("unused")
public class AggBy {
    /**
     * Get a {@link Collector} that calculates the derived collectFor given a specific ordering and 2 summary
     * functions.
     */
    @SuppressWarnings("unchecked")
    public static <T, U, R1> Collector<T, ?, Tuple1<R1>>
    collectBy(Function<T, U> mapper,
              Comparator<? super U> comparator,
              BiFunction<List<Tuple2<T, U>>, Function<? super T, ? extends U>, ? extends R1> sf1) {
        return summariesByUnsafe(mapper, comparator, sf1);
    }

    /**
     * Get a {@link Collector} that calculates the derived collectFor given a specific ordering and 2 summary
     * functions.
     */
    @SuppressWarnings("unchecked")
    public static <T, U, R1, R2> Collector<T, ?, Tuple2<R1, R2>>
    collectBy(Function<T, U> mapper,
              Comparator<U> comparator,
              BiFunction<List<Tuple2<T, U>>, Function<? super T, ? extends U>, ? extends R1> sf1,
              BiFunction<List<Tuple2<T, U>>, Function<? super T, ? extends U>, ? extends R2> sf2) {
        return summariesByUnsafe(mapper, comparator, sf1, sf2);
    }

    /**
     * Get a {@link Collector} that calculates the derived collectFor given a specific ordering and 3 summary
     * functions.
     */
    @SuppressWarnings("unchecked")
    public static <T, U, R1, R2, R3> Collector<T, ?, Tuple3<R1, R2, R3>>
    collectBy(Function<T, U> mapper,
              Comparator<U> comparator,
              BiFunction<List<Tuple2<T, U>>, Function<? super T, ? extends U>, ? extends R1> sf1,
              BiFunction<List<Tuple2<T, U>>, Function<? super T, ? extends U>, ? extends R2> sf2,
              BiFunction<List<Tuple2<T, U>>, Function<? super T, ? extends U>, ? extends R2> sf3) {
        return summariesByUnsafe(mapper, comparator, sf1, sf2, sf3);
    }

    /**
     * Get a {@link Collector} that calculates the derived collectFor given a specific ordering and 4 summary
     * functions.
     */
    @SuppressWarnings("unchecked")
    public static <T, U, R1, R2, R3, R4> Collector<T, ?, Tuple4<R1, R2, R3, R4>>
    collectBy(Function<T, U> mapper,
              Comparator<U> comparator,
              BiFunction<List<Tuple2<T, U>>, Function<? super T, ? extends U>, ? extends R1> sf1,
              BiFunction<List<Tuple2<T, U>>, Function<? super T, ? extends U>, ? extends R2> sf2,
              BiFunction<List<Tuple2<T, U>>, Function<? super T, ? extends U>, ? extends R3> sf3,
              BiFunction<List<Tuple2<T, U>>, Function<? super T, ? extends U>, ? extends R4> sf4) {
        return summariesByUnsafe(mapper, comparator, sf1, sf2, sf3, sf4);
    }    // Methods up to Tuple16

    @SuppressWarnings({"unchecked", "EnhancedSwitchMigration"})
    private static <T, U, V extends Tuple> Collector<T, ?, V>
    summariesByUnsafe(Function<? super T, ? extends U> mapper,
                      Comparator<? super U> comparator,
                      BiFunction<List<Tuple2<T, U>>, Function<? super T, ? extends U>, ?>... summaryFunctions) {

        { // Validate summaryFunctions
            final int width = Objects.requireNonNull(summaryFunctions).length;
            if (0 == width)
                throw new IllegalArgumentException("no summaryFunctions");
            for (int i = 0; i < width; i++)
                if (null == summaryFunctions[i])
                    throw new IllegalArgumentException(String.format("summaryFunctions[%d] is null", i));
        }

        return Collector.of(
                (Supplier<ArrayList<Tuple2<T, U>>>) ArrayList::new,
                (l, v) -> l.add(tuple(v, mapper.apply(v))),
                (l1, l2) -> {
                    l1.addAll(l2);
                    return l1;
                },
                l -> {
                    final int size = l.size();

                    final List<Tuple2<T, U>> x;
                    switch (size) {
                        case 0:
                            x = Collections.emptyList();
                            break;
                        case 1:
                            x = Collections.singletonList(l.get(0));
                            break;
                        default:
                            l.sort(Comparator.comparing(t -> t.v2, comparator)); // Compare U
                            x = Collections.unmodifiableList(l);
                            break;
                    }

                    // Collect results in a transient array.
                    final int width = summaryFunctions.length;
                    final Object[] r = new Object[width];
                    for (int i = 0; i < width; i++)
                        r[i] = summaryFunctions[i].apply(x, mapper);

                    // Convert array to Tuple
                    return TupleUtils.toTuple(r);
                });
    }

    public static <T, U> BiFunction<List<Tuple2<T, U>>, Function<? super T, ? extends U>, Optional<T>>
    percentileBy(double percentile, PercentileFunction<T, U> percentileFunction) {
        if (percentile < 0.0 || percentile > 1.0)
            throw new IllegalArgumentException("Percentile must be between 0.0 and 1.0");

        return (l, mapper) -> {
            final int size = l.size();
            if (0 == size)
                return Optional.empty();
            if (1 == size || percentile == 0d)
                return Optional.of(l.get(0).v1);
            if (percentile == 1d)
                return Optional.of(l.get(size - 1).v1);

            // Limit fraction size, to stop common errors for double percentile values e.g. 2E-16.
            // 0.5d is added because actual percentile value can be between values.
            final double dIndex = ((double) Math.round(size * percentile * 1.0E6d) * 1.0E-6d) - 0.5d;
            int index = (int) dIndex; // floor, for before or exact index
            if (index >= size)
                return Optional.of(l.get(size - 1).v1);

            final Tuple2<T, U> t0 = l.get(index); // 1st before or exact value
            final double indexFraction = dIndex - index;
            // If end or exact index, return t0 value.
            if (++index == size || indexFraction == 0d)
                return Optional.of(t0.v1);

            final Tuple2<T, U> t1 = l.get(index); // after value
            // Only call percentile function if t*.v1 values are different.
            return Optional.of((t0.v1.equals(t1.v1))
                               ? t0.v1
                               : percentileFunction.apply(t0.v1, t1.v1, indexFraction, mapper));
        };
    }
}

Example use code:

import org.jooq.lambda.Agg;
import org.jooq.lambda.Seq;
import org.jooq.lambda.tuple.*;

import java.time.Duration;
import java.util.*;
import java.util.function.Function;
import java.util.stream.Collector;

import static org.jooq.lambda.tuple.Tuple.tuple;

public class SeqTest {
    static final Double[] values = {0d, 10d, 20d, 30d, 40d, 50d, 60d, 70d, 80d, 90d};

    static final PercentileFunction<Double, Double> floor = (t0, t1, f, m) -> (double) m.apply(t0);

    static final PercentileFunction<Double, Double> ceil = (t0, t1, f, m) -> (double) m.apply(t1);

    static final PercentileFunction<Double, Double> halfUp = (t0, t1, f, m) -> {
        double v0 = m.apply(t0);
        double v1 = m.apply(t1);
        return f < 0.5d ? v0 : v1;
    };

    static final PercentileFunction<Double, Double> interpolate = (t0, t1, f, m) -> {
        double v0 = m.apply(t0);
        double v1 = m.apply(t1);
        return v0 - (v0 * f) + (v1 * f); // Linear interpolation
    };

    private static void collectForTest() {
        class Count {
            long count;
        }
        final Collector<Double, HashMap<Double,Count>, Map<Double,Count>> distinct = Collector.of(
                HashMap::new,
                (a, t) -> a.computeIfAbsent(t, x -> new Count()).count++,
                (a1, a2) -> {
                    a2.forEach((k,v) -> a1.computeIfAbsent(k, x -> new Count()).count += v.count);
                    return a1;
                },
                a -> a);

        final String header = "percentile -> |  Agg | floor | halfUp | interpolate | ceil | #distinct";
        System.out.println(header);
        for (double p = 0d; p <= 1.00d; p += 0.05d) {
            final Tuple2<Optional<Double>, Tuple5<Optional<Double>,Optional<Double>,Optional<Double>,Optional<Double>,Map<Double,Count>>>
                    r = Seq.of(values)
                           .collect(
                                   Tuple.collectors(
                                           Agg.<Double>percentile(p, Comparator.naturalOrder()),
                                           AggFor.collectFor(Comparator.naturalOrder(),
                                                             AggFor.percentileFor(p, floor),
                                                             AggFor.percentileFor(p, halfUp),
                                                             AggFor.percentileFor(p, interpolate),
                                                             AggFor.percentileFor(p, ceil),
                                                             AggFor.collectFor(distinct))
                                                   ));
            final Tuple5<Optional<Double>,Optional<Double>,Optional<Double>,Optional<Double>,Map<Double,Count>> v2 = r.v2;
            System.out.printf("   %5.3f   -> | %4.1f |  %4.1f |   %4.1f |    %4.1f     | %4.1f | %d%n",
                              p,
                              r.v1.orElse(0d),
                              v2.v1.orElse(0d),
                              v2.v2.orElse(0d),
                              v2.v3.orElse(0d),
                              v2.v4.orElse(0d),
                              v2.v5.size());
        }
        System.out.println(header);
    }

    private static void collectByTest() {
        final String header = "percentile -> |  Agg | floor | halfUp | interpolate | ceil";
        System.out.println(header);
        for (double p = 0d; p <= 1.00d; p += 0.05d) {
            Tuple2<Optional<Double>, Tuple4<Optional<Double>, Optional<Double>, Optional<Double>, Optional<Double>>> r =
                    Seq.of(values)
                       .collect(Tuple.collectors(
                               Agg.<Double>percentile(p, Comparator.naturalOrder()),
                               AggBy.collectBy(Function.identity(),
                                               Comparator.naturalOrder(),
                                               AggBy.percentileBy(p, floor),
                                               AggBy.percentileBy(p, halfUp),
                                               AggBy.percentileBy(p, interpolate),
                                               AggBy.percentileBy(p, ceil))
                                                )
                               );
            Tuple4<Optional<Double>, Optional<Double>, Optional<Double>, Optional<Double>> v2 = r.v2;
            System.out.printf("   %5.3f   -> | %4.1f |  %4.1f |   %4.1f |    %4.1f     | %4.1f%n",
                              p,
                              r.v1.orElse(0d),
                              v2.v1.orElse(0d),
                              v2.v2.orElse(0d),
                              v2.v3.orElse(0d),
                              v2.v4.orElse(0d));
        }
        System.out.println(header);    }

    public static void main(String[] args) {
        collectForTest();
        System.out.println();
        collectByTest();
    }
}

Result:

percentile -> |  Agg | floor | halfUp | interpolate | ceil | #distinct
   0.000   -> |  0.0 |   0.0 |    0.0 |     0.0     |  0.0 | 10
   0.050   -> |  0.0 |   0.0 |    0.0 |     0.0     |  0.0 | 10
   0.100   -> |  0.0 |   0.0 |   10.0 |     5.0     | 10.0 | 10
   0.150   -> | 10.0 |  10.0 |   10.0 |    10.0     | 10.0 | 10
   0.200   -> | 10.0 |  10.0 |   20.0 |    15.0     | 20.0 | 10
   0.250   -> | 20.0 |  20.0 |   20.0 |    20.0     | 20.0 | 10
   0.300   -> | 20.0 |  20.0 |   30.0 |    25.0     | 30.0 | 10
   0.350   -> | 30.0 |  30.0 |   30.0 |    30.0     | 30.0 | 10
   0.400   -> | 30.0 |  30.0 |   40.0 |    35.0     | 40.0 | 10
   0.450   -> | 40.0 |  40.0 |   40.0 |    40.0     | 40.0 | 10
   0.500   -> | 40.0 |  40.0 |   50.0 |    45.0     | 50.0 | 10
   0.550   -> | 50.0 |  50.0 |   50.0 |    50.0     | 50.0 | 10
   0.600   -> | 50.0 |  50.0 |   60.0 |    55.0     | 60.0 | 10
   0.650   -> | 60.0 |  60.0 |   60.0 |    60.0     | 60.0 | 10
   0.700   -> | 70.0 |  60.0 |   70.0 |    65.0     | 70.0 | 10
   0.750   -> | 70.0 |  70.0 |   70.0 |    70.0     | 70.0 | 10
   0.800   -> | 80.0 |  70.0 |   80.0 |    75.0     | 80.0 | 10
   0.850   -> | 80.0 |  80.0 |   80.0 |    80.0     | 80.0 | 10
   0.900   -> | 90.0 |  80.0 |   90.0 |    85.0     | 90.0 | 10
   0.950   -> | 90.0 |  90.0 |   90.0 |    90.0     | 90.0 | 10
percentile -> |  Agg | floor | halfUp | interpolate | ceil | #distinct

percentile -> |  Agg | floor | halfUp | interpolate | ceil
   0.000   -> |  0.0 |   0.0 |    0.0 |     0.0     |  0.0
   0.050   -> |  0.0 |   0.0 |    0.0 |     0.0     |  0.0
   0.100   -> |  0.0 |   0.0 |   10.0 |     5.0     | 10.0
   0.150   -> | 10.0 |  10.0 |   10.0 |    10.0     | 10.0
   0.200   -> | 10.0 |  10.0 |   20.0 |    15.0     | 20.0
   0.250   -> | 20.0 |  20.0 |   20.0 |    20.0     | 20.0
   0.300   -> | 20.0 |  20.0 |   30.0 |    25.0     | 30.0
   0.350   -> | 30.0 |  30.0 |   30.0 |    30.0     | 30.0
   0.400   -> | 30.0 |  30.0 |   40.0 |    35.0     | 40.0
   0.450   -> | 40.0 |  40.0 |   40.0 |    40.0     | 40.0
   0.500   -> | 40.0 |  40.0 |   50.0 |    45.0     | 50.0
   0.550   -> | 50.0 |  50.0 |   50.0 |    50.0     | 50.0
   0.600   -> | 50.0 |  50.0 |   60.0 |    55.0     | 60.0
   0.650   -> | 60.0 |  60.0 |   60.0 |    60.0     | 60.0
   0.700   -> | 70.0 |  60.0 |   70.0 |    65.0     | 70.0
   0.750   -> | 70.0 |  70.0 |   70.0 |    70.0     | 70.0
   0.800   -> | 80.0 |  70.0 |   80.0 |    75.0     | 80.0
   0.850   -> | 80.0 |  80.0 |   80.0 |    80.0     | 80.0
   0.900   -> | 90.0 |  80.0 |   90.0 |    85.0     | 90.0
   0.950   -> | 90.0 |  90.0 |   90.0 |    90.0     | 90.0
percentile -> |  Agg | floor | halfUp | interpolate | ceil
lukaseder commented 3 years ago

Thanks for your suggestions. There's a lot of ideas and code mixed together, and also a dependency towards https://github.com/jOOQ/jOOL/issues/380. It's hard to assess at this point, what ideas are worth pursuing and which ones aren't. In the past, I've regretted hastily integrating any unstructured suggestions and PRs.

From that experience, I can tell, however, that a difficult to grasp method signature like this one:

    public static <T, U, R1, R2> Collector<T, ?, Tuple2<R1, R2>>
    collectBy(Function<T, U> mapper,
              Comparator<U> comparator,
              BiFunction<List<Tuple2<T, U>>, Function<? super T, ? extends U>, ? extends R1> sf1,
              BiFunction<List<Tuple2<T, U>>, Function<? super T, ? extends U>, ? extends R2> sf2) {
        return summariesByUnsafe(mapper, comparator, sf1, sf2);
    }

... is not going to make it in the public API of this library. It seems super specialised for a single task.

The idea that Tuple.collectors() should be able to share resources is a very complex one. The idea of adding support for PERCENTILE_CONT semantics (#380) is much less complex. I don't think I'll be able to allocate time to this task here very soon, but #380 is certainly more realistic.

rwperrott commented 3 years ago
BiFunction<List<Tuple2<T, U>>, Function<? super T, ? extends U>, ? extends R1> sf1

could be declared as a sub-class of BiFunction, like:

@FunctionalInterface
public interface Tuple2ListMapping<T,U,R> 
extends BiFunction<List<Tuple2<T, U>>, Function<? super T, ? extends U>, ? extends R> {}

allowing much shorter column declarations, like

Tuple2ListMapping<T,U,R> sf1

I don't think that column source sharing needs to be complex, because id binding is simpler and is probably safe than introspection guesses:

Something like this may work (not tested yet):

public class IdCollector<T,A,R,RR> implements Collector<T,A,RR> {
    final Collector<T,A,R> baseCollector;
    final Function<R,RR> primaryFinisher;
    private R r; // baseCollector.finisher().apply(a) result.

    public IdCollector(final Collector<T, A, R> baseCollector, final Function<R, RR> primaryFinisher) {
        this.baseCollector= baseCollector;
        this.primaryFinisher= primaryFinisher;
    }

    // Assumes that created secondary collector finisher() will always be called after primary collector finisher()
    public Collector<T,A,RR> secondary(final Function<R,RR> secondaryFinisher) {
        return Collector.of(
                ()->null,
                (a,t) -> {},
                (a1,a2) -> null,
                a -> secondaryFinisher.apply(r));
    }

    @Override
    public Supplier<A> supplier() {
        return baseCollector.supplier();
    }

    @Override
    public BiConsumer<A, T> accumulator() {
        return baseCollector.accumulator();
    }

    @Override
    public BinaryOperator<A> combiner() {
        return baseCollector.combiner();
    }

    @Override
    public Function<A, RR> finisher() {
        return baseCollector.finisher()
                        .andThen(r -> primaryFinisher.apply(this.r = r));
    }

    @Override
    public Set<Characteristics> characteristics() {
        return baseCollector.characteristics();
    }
}
rwperrott commented 3 years ago

For *By columns this would be most efficient if the T to U keyExtractor function and keyComparator are both reference equal for all columns, to allow sharing a collected, the pre-sorted List, sorted by the same U values, otherwise the list would need to be copied and resorted for each column.

It looks more memory efficient to base the *By functionality off a list of the T value, then using keyComparator with the keyExtractor e.g via this method in Comparator (Java 1.8+)

    public static <T, U> Comparator<T> comparing(
            Function<? super T, ? extends U> keyExtractor,
            Comparator<? super U> keyComparator)

Likewise if the List collector based sourced columns Comparator must be reference equal for all columns, otherwise the list would need to be copied and resorted for each column.

rwperrott commented 3 years ago

See my latest Sharable Collector code, with support for reference and mapped id based results sharing at https://github.com/rwperrott/sharable-collector

lukaseder commented 3 years ago

@rwperrott I appreciate your efforts, thank you very much. To make sure expectations are set right, I currently cannot invest much time in jOOL. This includes reviewing your various messages and code (as this is by no means a trivial task).

At some point in the future, I'll be able to allocate some dedicate time to see what's currently open and missing, but I cannot make any promises.

rwperrott commented 3 years ago

@lukaseder Thanks. I expected you were busy. given prior feedback. I also developed this speculative solution because it's a fun exercise and I think it'll be useful for my projects too.