mengdiwang / guava-libraries

Automatically exported from code.google.com/p/guava-libraries
Apache License 2.0
0 stars 0 forks source link

Feature request: Iterables.distinct/Iterables.unique #524

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I find that whenever I am polling information from several sources I get 
natural duplicates occurring, while that's not usually a problem, writing 
boiler-plate code to filter out duplicates gets tedious.

I propose something like this, although perhaps there is a more elegant 
solution available;

    /**
     * Returns an Iterable where all the elements are distinct, dismissing any repeats.
     * 
     * @param <E>
     * @param original
     * @return
     */
    public static <E> Iterable<E> distinct(Iterable<E> original) {
        List<E> distinct = Lists.newArrayList();
        for (E part : Iterables.filter(original, Predicates.not(Predicates.in(distinct)))) {
            distinct.add(part);
        }
        return ImmutableList.copyOf(distinct);
    }

I have no brilliant idea on how this can be implemented for some of the other 
classes like Iterators.

Original issue reported on code.google.com by Bjorn.So...@gmail.com on 21 Jan 2011 at 2:18

GoogleCodeExporter commented 9 years ago
Why not use the LinkedHashSet implementation? Only once each element, and in 
the same order. Do you really need an on-the-fly iterable?

Original comment by ogregoire on 21 Jan 2011 at 4:51

GoogleCodeExporter commented 9 years ago
I needed to do this the other day, and searched for it in Guava. It then 
occured to me that what I really wanted was a Set:

Set<E> distinctElements = ImmutableSet.copyOf(original);

This will return an ImmutableSet, that contains the distinct elements, in order.

I then thought: "what if I don't want to instantiate a whole Set and fill the 
memory? I just want to iterate through the original Iterable's distinct 
elements!"

I then realized that what I wanted to do was impossible: when you want to find 
distinct elements in an Iterable, you *need* to keep track of past elements. 
So, you would need a Set internally anyway. So why not simply use a set?

tl;dr:
use ImmutableSet.copyOf(elements)

Original comment by nev...@gmail.com on 21 Jan 2011 at 4:54

GoogleCodeExporter commented 9 years ago
Agreed; create and populate a Set.

Original comment by kevinb@google.com on 22 Jan 2011 at 7:00

GoogleCodeExporter commented 9 years ago
Wouldn't it be possible to do something like this then in Iterables?

    public static <E> Iterable<E> distinct(Iterable<E> original) {
        return ImmutableSet.copyOf(original);
    }

While this wouldn't make my personal code any smaller, at the very least this 
would make the code more intelligible. Going via Set is an elegant way to solve 
this, but it feels odd to do so in the middle of some business logic.

Anyway, thanks for the input.

Original comment by Bjorn.So...@gmail.com on 26 Jan 2011 at 9:28

GoogleCodeExporter commented 9 years ago
The value this gives is too small to justify adding it to the library, and it 
takes user code further away from standard Java coding practice.

Original comment by kevinb@google.com on 26 Jan 2011 at 10:25

GoogleCodeExporter commented 9 years ago
What if an iterable is of unlimited size? This LinkedListSet approach won't 
work at all

Original comment by egor@technoparkcorp.com on 9 Aug 2013 at 7:56

GoogleCodeExporter commented 9 years ago
See also issue 1464, which covers the case of a sorted input, which we can 
deduplicate in constant space.

For unsorted inputs, egor gave an implementation here that people could try out:
http://code.google.com/p/guava-libraries/issues/detail?id=1464#c7

I'll leave the issue as WontFix because the use case is a little more specific 
than the typical Guava method:

- The input must be infinite or at least so large that it's impractical to 
create a set containing all the elements.
- The user must want to process a small enough portion of the input that that 
portion can fit in memory.
- The user must not know exactly how many elements he wants up front, or else 
he could create the set then, populated with the appropriate number of items.

Original comment by cpov...@google.com on 9 Aug 2013 at 7:20

GoogleCodeExporter commented 9 years ago
This issue has been migrated to GitHub.

It can be found at https://github.com/google/guava/issues/<id>

Original comment by cgdecker@google.com on 1 Nov 2014 at 4:15

GoogleCodeExporter commented 9 years ago

Original comment by cgdecker@google.com on 3 Nov 2014 at 9:09