Closed GoogleCodeExporter closed 9 years ago
Here are some static methods we can work on releasing:
public static <E> List<Entry<E>> sortedEntries(
Multiset<E> multiset, Order order) {}
public static <E> List<E> mostFrequent(
Multiset<E> multiset, int n) {}
public static <E> List<E> leastFrequent(
Multiset<E> multiset, int n) {}
Sounds like these would help you?
Next, I have often mused about the idea of a "leaderboard" type of Multiset
which is
just like any other except always iterates in (increasing/decreasing) count
order. I
think I can take this as a request for that?
These two should probably be split.
Thanks for the requests!
Original comment by kevinb@google.com
on 29 Apr 2010 at 11:32
Yes, I would definitely like to see both those static methods added, and the
leaderboard style multiset. I'm not
sure which I'd use for the particular case in front of me, but I could see both
cleaning up a lot of the boilerplate I
have.
I suppose I just get excitable as to choice-like to have my options open.
I agree, this request should probably be split into the two different options.
Original comment by blank...@gmail.com
on 29 Apr 2010 at 11:58
Instead of returning List<E> for the most/leastFrequent(n), what about
returning List<Multiset.Entry<E>>, as is done for sortedEntries? This would
provide both the elements and counts directly, without requiring a subsequent
call to count() to retrieve the count. Or is the thinking that most callers
would want just the elements, and the counts can easily be retrieved with the
additional call to count() ?
Original comment by theycall...@gmail.com
on 7 Aug 2010 at 5:57
I don't understand why the methods are static.
Original comment by dfran...@gmail.com
on 8 Aug 2010 at 3:45
That is, shouldn't they be methods of Multiset?
public <E> List<Entry<E>> sortedEntries(Order order)
public static <E> List<E> mostFrequent(int n)
public static <E> List<E> leastFrequent(int n)
Original comment by dfran...@gmail.com
on 8 Aug 2010 at 3:46
The methods as part of the general Multiset utility class would be static
obviously. In general, Multisets *aren't* sorted, so it seems inappropriate to
require interface implementers to make them so. And I don't see it as API
feature *everyone* using the class would want (though obviously, I think enough
people would want it to warrant inclusion in the utility class).
For a class like what Kevin mentions re a Multiset sorted by element frequency,
specific instance methods might be appropriate - but I'd rather they weren't
and such a class instead simply followed a NavigableSet style API, where the
"comparator" compares by element count.
Original comment by blank...@gmail.com
on 8 Aug 2010 at 6:17
I think this would be _very_ useful. I use the mostcommon() function from this
bag class for Python very often: http://code.activestate.com/recipes/259174/
Original comment by rottenwi...@gmail.com
on 24 Sep 2010 at 7:52
Original comment by fry@google.com
on 28 Jan 2011 at 4:06
I'm very very interested in such a LeaderboardMultiset. If such exists in some
svn or elsewhere, I'd be very keen to experiment and play with it and provide
suggestions...
I've got a custom impl. of something similar to the static methods; it is using
a TreeSet to copy/store the entries to provide sorted order... And now I'm
struggling with out of memory errors due to the large volume entries I have in
my multiset.
Would the leaderboard implementation provide sorted order of counts without
keeping two copies around?? I've tried to think how I would implement this,
but so far I've only come up with solutions that would involve keeping two
separate collections, one mapping to the count, and another keeping the count
in sorted order (along with a reference back to the element).
Original comment by hein.mel...@gmail.com
on 23 Feb 2011 at 10:53
Here's an alternative. If Multiset had a method
Map<E, Integer> asMap();
Then you could use Multimaps.invertFrom() to construct a new map sorted by
frequencies in the multiset.
Original comment by fin...@gmail.com
on 24 Feb 2011 at 8:51
Interesting idea. But I guess, my main issue is that keeping two copies of the
same collection data is difficult due to the large number of entries in my
multiset, and this approach does not help solve that root problem.
(invertFrom() makes a copy of every element in the provided Map).
Original comment by hein.mel...@gmail.com
on 24 Feb 2011 at 10:28
You effectively want to maintain the elements sorted in two different orders.
You *will* need two containers. (But hey, these are merely just pointer
indexes, not that big to copy).
Original comment by jim.andreou
on 25 Feb 2011 at 2:56
If you only want the top N values you could copy the entries to a
limited-capacity MinMaxPriorityQueue instead of a Multimap.
Original comment by fin...@gmail.com
on 25 Feb 2011 at 1:14
The Multisets.countFunction() from issue #419 would go a long way toward
solving this, though the methods seem useful enough to be worth having anyway.
List<E> topElements = Ordering.natural()
.onResultOf(Multisets.countFunction(multiset))
.greatestOf(multiset.elementSet(), k);
Original comment by cgdec...@gmail.com
on 14 Apr 2011 at 1:45
Original comment by kevinb@google.com
on 13 Jul 2011 at 6:18
Original comment by kevinb@google.com
on 16 Jul 2011 at 8:37
Done at head; see
http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/collect
/Multisets.html#copyHighestCountFirst%28com.google.common.collect.Multiset%29
Original comment by wasserman.louis
on 2 Oct 2011 at 10:04
I'm interested in naming suggestions for this method; I find it extraordinarily
hard to name well.
Original comment by kevinb@google.com
on 8 Oct 2011 at 3:01
Multisets.copyOrderedByFrequency() (shorter, but users might think it's
actually starting with least frequent elements...)
Multisets.copyOrderedByDecreasingFrequency() (longer, but clearer)
Multisets.immutableCopyOrderedByDecreasingFrequency() (too long IMO...)
I also like finn's asMap() idea, which lets you use the Map<E, Integer> to
build a Multimap<Integer, E> with the appropriate order.
Another idea would be to expose an "asFrequencyMultimap()" that could be
ordered from least frequent to most frequent.
Original comment by nev...@gmail.com
on 8 Oct 2011 at 4:03
"count" seems better than frequency - shorter and already used elsewhere in the
API. "rank" might also make sense, but it's perhaps muddled relative to the
formal mathematical definition of rank. So, using count:
Multisets.countSortedCopy()
I don't think there's a need for a ...Desc() version, since whatever the result
is should be traversable in reverse. That is - seems "straightforward" to
return something like a ImmutableTreeMultiset that offers a descending iterator.
That also provides a nice baseline for, say, .countSortedView(), if modifiable
view is someday desirable.
Original comment by blank...@gmail.com
on 8 Oct 2011 at 11:36
Good point about "count" being shorter than "frequency" and already used
elsewhere in the API.
I'd like to change my suggestion to:
Multisets.copyOrderedByCount()
Multisets.copyOrderedByDecreasingCount()
The "decreasing" part is there because, according to the method's javadoc, it
returns "an ImmutableMultiset whose iteration order is highest count first". I
thought it would be wise to make this explicit in the method name.
Of course, we could get rid of it and provide a method returning a Multiset
ordered by *ascending* count instead. This set could potentially be traversed
in reverse... But I think ascending count is the least useful order, and I'm
not sure privileging it is the right approach...
Original comment by nev...@gmail.com
on 9 Oct 2011 at 5:13
I'm still pretty attached to the current name, copyHighestCountFirst. Let me
restate the arguments for it here:
> Multisets.copyOrderedByFrequency() (shorter, but users might think it's
actually starting with least frequent elements...)
copyHighestCountFirst is completely unambiguous, while being actually slightly
shorter.
> Multisets.copyOrderedByCount()
I think this still has the problem that it's not clear which direction counts
are being ordered in.
Original comment by wasserman.louis
on 16 Oct 2011 at 11:35
> copyHighestCountFirst is completely unambiguous, while being actually
slightly shorter.
Well, the name only refers to the first element(s). I personally find
`copyHigh[est]ToLow[est]Counts` to be even more clear.
But then again, I'm unsure about the "copy" prefix and `highestToLowestCounts`
might do it as well.
Original comment by j...@nwsnet.de
on 17 Oct 2011 at 7:40
Hrrrrm. That's not how I read it -- I read it as "highest counts come first,
lower counts come later."
I'm not sure I like "copyHighestToLowestCounts" though, just because it doesn't
say what "highest to lowest" refers to.
How do you feel about "copyHigherCountsFirst"?
Original comment by wasserman.louis
on 17 Oct 2011 at 5:42
To take a completely different tack:
Why not have a "ImmutableMultiset<T> Multisets.sortedCopy(Multiset<T> src,
Comparator<T> order)" method, and a "Ordering<T>
Multisets.higherCountFirstOrdering(Class<T> cls)" method?
I think it's also reasonable to define that the "natural" ordering of a
multiset is by count (not the natural ordering of it's contents), such that a
method
ImmutableMultiset<T> Multisets.sortedCopy(Multiset<T> src)
would behave as
ImmutableMultiset<T> Multisets.sortedCopy(Multiset<T> src,
Multisets.higherCountFirstOrdering(T.class)).
Provides a lot more flex for only a little more code.
Original comment by blank...@gmail.com
on 18 Oct 2011 at 4:09
This is, I'm afraid, not a well-defined ordering at all. There is no
implementation for
Ordering<T> Multisets.higherCountFirstOrdering(Class<T> cls)
Original comment by wasserman.louis
on 18 Oct 2011 at 4:35
Ah, yes - that ordering must be in reference to some particular multiset - I
was thinking too hopefully that there might be someway to get away from that.
No problem: Ordering<T> Multisets.higherCountFirstOrdering(final Multiset<T>
ref), implemented as, essentially:
int compare(T left, T right) {
return ref.count(one) - ref.count(theOther);
}
Original comment by blank...@gmail.com
on 18 Oct 2011 at 5:31
Except, of course, this ordering isn't compatible with equals().
I've thought about this direction, I don't believe it can be made to work, I'm
afraid.
Original comment by wasserman.louis
on 18 Oct 2011 at 5:45
Can you elaborate on incompatibility with equals()? I also spaced on that
earlier, but:
left == right seems to obviously compare to 0, given the way equality works
with Multiset.
compare = 0 -> left == right seems to be resolvable by either the T being
already comparable, or by supplying a second comparator (perhaps a bit clunky).
i.e.,
ImmutableMultiset<T> Multisets.countSortedCopy(Multiset<T extends
Comparable<T>> src)
ImmutableMultiset<T> Multisets.countSortedCopy(Multiset<T> src, Comparator<T>
nonCountOrder)
and then for the count ordering
private final Comparator<T> ifZero = //...either natural or supplied
comparator/ordering
int compare(T left, T right) {
int comp = ref.count(left) - ref.count(right);
return comp !=0 ? comp : ifZero.compare(left, right);
}
Original comment by blank...@gmail.com
on 18 Oct 2011 at 6:21
or maybe those signatures need to be applied to the method that returns the
Ordering, but the same idea applies: require either a natural or supplied
comparator to handle the count(A) == count(B) && A != B case to maintain
compatibility with equals().
Original comment by blank...@gmail.com
on 18 Oct 2011 at 6:25
The point is, it's so awkward to make this approach work that it's no longer
worthwhile.
Original comment by wasserman.louis
on 18 Oct 2011 at 6:33
Has anything like this been considered?
Multisets.countOrderedCopy(multiset).highestToLowest()
Multisets.countOrderedCopy(multiset).lowestToHighest()
Good:
- separates "ordered by count" from "in which order?"
- reads fairly nicely
Bad:
- requires adding an interface to support it
- possibly confusing since "countOrderedCopy(multiset)" doesn't clearly
indicate that another call is required to get the final result
Original comment by cgdec...@gmail.com
on 18 Oct 2011 at 6:35
Hmmmm.
We considered extending the ImmutableMultiset builder to support an option like
this. I think we concluded, though, that it wasn't really ImmutableMultiset's
job to provide count-ordering.
That said, I like this...and I can't 100% recall why we rejected something
along these lines.
Original comment by wasserman.louis
on 18 Oct 2011 at 6:40
caveat: I haven't hooked these up to tests, this is basically just porting what
I did to solve the problem back when I proposed it into something like the
solution that was linked above. No compilation problems at least, right?
The attached is an implementation that I think is not so awkward as to be
untenable.
It provides:
descCountOrdering() - doc explicitly states it is inconsistent with equals and
indicates easy ways to make consistent with equals. I left in comments the
options to provide methods to do that (countOrderingWithEquals()) as an
alternative to exposing some autocast singletons.
I've got some commented out rough implementations for the API for the typical
use case coverage:
countOrderedCopy(Multiset) - just uses descCountOrdering
countOrderedCopy(Multiset, Comparator) - adds secondary ordering on T, easy to
use with Ordering.<T>natural() or any custom comparator
Original comment by blank...@gmail.com
on 20 Oct 2011 at 6:02
Attachments:
This is some pretty good code -- you made excellent use of the Ordering
features -- but I don't think this is the right approach.
First off, we don't want to expose to the user any comparators on multiset
entries. Specifically, we don't want to expose countOrdering(),
descCountOrdering(), or anything of the sort. These are just a means to an end
-- sorting multisets by element frequency -- and if we're already exposing
something like countOrderedCopy(Multiset), there's no reason to additionally
expose countOrdering() or descCountOrdering().
Secondly, we don't want to expose any new tools that take extra orderings on
the multiset elements. The only feature we're planning on offering is
something that lets you take a multiset and get back a copy with the highest
counts coming first, with ties broken by whichever elements came first in the
original multiset. (If users really want ties to be broken according to some
Comparator<E>, they can just do an intermediate call to
ImmutableSortedMultiset.copyOf(multiset, myComparator).)
For reference, please remember that sending us code doesn't make it more likely
that your feature will be accepted to Guava. Writing code is maybe 10-20% of
the work we do for each feature we accept: the real work is discussing,
arguing, and naming things, and even so we reject most feature requests. In
the future, please work out an API with us, and make sure everybody's
satisfied, before you spend time writing patches.
Original comment by wasserman.louis
on 20 Oct 2011 at 6:48
Ah, I suppose I'm still pining for the TreeMultiset that sorts on count per my
original request, (or LeaderboardMultiset like in comment #9), hence the
continuous pestering.
As for the rest, my own solution to the problem I faced a year ago did expose
those things (and they've found some use) but not exposing them has pros as
well (e.g., plenty of ways to misuse), and I find actually writing code a
better way to think and communicate about any API - not from a misunderstanding
about what gets features into Guava.
Anyway, thanks for the detailed rejection, and I look forward to the removal of
@Beta, however the method is finally named/signatured!
Original comment by blank...@gmail.com
on 20 Oct 2011 at 7:27
Original comment by fry@google.com
on 10 Dec 2011 at 3:45
Done as of release 11 with copyHighestCountFirst.
Original comment by wasserman.louis
on 28 Dec 2011 at 11:54
This issue has been migrated to GitHub.
It can be found at https://github.com/google/guava/issues/<id>
Original comment by cgdecker@google.com
on 1 Nov 2014 at 4:15
Original comment by cgdecker@google.com
on 3 Nov 2014 at 9:10
Original issue reported on code.google.com by
blank...@gmail.com
on 29 Apr 2010 at 3:28