mkodekar / guava-libraries

Automatically exported from code.google.com/p/guava-libraries
Apache License 2.0
0 stars 0 forks source link

Optimizations for sorting on a key that's expensive to compute #1873

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Internally, we recently added a class that is basically the counterpart to 
Ordering.leastOf/greatestOf when the data to be sorted isn't available as an 
Iterator/Iterable:

public final class TopKSelector<T> {
  public static <T extends Comparable<? super T>> TopKSelector<T> least(int k);
  public static <T extends Comparable<? super T>> TopKSelector<T> greatest(int k);
  public static <T> TopKSelector<T> least(int k, Comparator<? super T> comparator);
  public static <T> TopKSelector<T> greatest(int k, Comparator<? super T> comparator);
  public void offer(@Nullable T elem);
  public void offerAll(Iterable<? extends T> elements);
  public void offerAll(Iterator<? extends T> elements);
  public List<T> topK();
}

Part of our motivation for doing so was the existence of code like this:

MinMaxPriorityQueue<Pair<Key, Double>> values =
    MinMaxPriorityQueue.orderedBy(SECOND_VALUE)
        .maximumSize(numberToRank).create();
for (Key key : keys) {
  values.add(Pair.of(key, possiblyExpensiveComputation(key)));
}

ImmutableList.Builder<Key> result =
    ImmutableList.builder();
Pair<Key, Double> pair = values.poll();
while (pair != null) {
  result.add(pair.getFirst());
  pair = values.poll();
}
return result.build();

With TopKSelector, the code is simpler because its topK() allows iteration over 
the values in sorted order. MinMaxPriorityQueue, by comparison, requires a 
manual poll() loop. (TopKSelector has other advantages that I won't discuss 
here, including working in places where MinMaxPriorityQueue just can't.)

What I want to discuss here is what I see as "the real problem" with our 
example code: It needs to convert from a Key to a Pair<Key, Double> and then 
back. The caller ought to be able to write:

return Ordering.onResultOf(possiblyExpensiveFunction()).greatestOf(keys);

But that could invoke possiblyExpensiveFunction() many times per key. Ideally 
it would to do so only once. (Ideally this would be true not only of 
Ordering.leastOf/greatestOf but also the new TopKSelector.)

There are multiple ways to approach this problem:

1. Have Ordering.onResultOf return a special kind of Ordering (likely still 
package private, especially since things get tricky if onResultOf is chained 
with other calls) that greatestOf, etc. recognize.

2. Have an entirely separate API (maybe an extension to TopKSelector that takes 
a Function instead of or in addition to a Comparator, or maybe a separate 
class) that requires users to more explicitly choose this behavior.

Original issue reported on code.google.com by cpov...@google.com on 30 Oct 2014 at 3:22

GoogleCodeExporter commented 9 years ago
This issue has been migrated to GitHub.

It can be found at https://github.com/google/guava/issues/<issue id>

Original comment by cgdecker@google.com on 1 Nov 2014 at 4:08

GoogleCodeExporter commented 9 years ago

Original comment by cgdecker@google.com on 1 Nov 2014 at 4:17

GoogleCodeExporter commented 9 years ago

Original comment by cgdecker@google.com on 3 Nov 2014 at 9:07