migrator / guava-libraries

Guava: Google Core Libraries for Java 1.6+
0 stars 0 forks source link

Document that Predicates.in uses the contains() method of the collection #26

Open migrator opened 9 years ago

migrator commented 9 years ago

Hi, There seems to be a performance bug in the "in" method from the Predicates class.

The problem is that we can generate an In Predicate that has an underlying collection with a slow containment implementation, for example an ArrayList.

The following code exposes the problem:


public static void testPredicateIn() {

    ArrayList<Integer> predicateCollection=new ArrayList<Integer>();

    for(int i=0;i<500000;++i)
        predicateCollection.add(new Integer(i));

    Predicate<Integer> inPred=Predicates.in(predicateCollection);

    for(int i=0;i<500000;++i) {
        inPred.apply(new Integer(i));
    }

}

__- Notice that we're creating an in Predicate over an ArrayList of 500000 integers. Applying that predicate to the integers between 1 and 500000 takes a few minutes.

I could think of a couple of ways of fixing it:

1) Put a disclaimer in the documentation that the user is advised to pass collections with fast containment methods.

2) Convert slow collections into HashSets, as in the following snippet:


private static <T> Predicate<T> optimizedIn(Collection<? extends T> target) {

    if(target instanceof Set)
        return Predicates.in(target);
    else
        return Predicates.in(new HashSet<T>(target));

}

Here are some experimental results with different sizes for the collection predicate for the original implementation and optimizedIn:

  Size      Original     Optimized                                                                                                       
    10      0m0.972s     0m0.996s                                                                                                        
   100      0m1.068s     0m1.004s                                                                                                        
  1000      0m1.572s     0m0.980s                                                                                                        
 10000      0m8.065s     0m1.016s                                                                                                        
100000      1m5.596s     0m1.052s                                                                                                        
500000      4m56.423s    0m1.200s               

Regards, Oswaldo.

relevance: 1

migrator commented 9 years ago

summary: Not Defined

Actually, I posted the root cause, rather than an actual performance bug in the library code.

Here's a less artificial example, where creating Filtered Collections exposes the performance issue in the slow predicate:


public static void testCollections2Filter() {

    ArrayList<Integer> filterCollection=new ArrayList<Integer>();
    ArrayList<Integer> toAddCollection=new ArrayList<Integer>();
    for(int i=0;i<500000;++i) {
        filterCollection.add(new Integer(i));
        toAddCollection.add(new Integer(i));
    }

    //      Predicate<Integer> inPred=optimizedIn(filterCollection);                                                                        
         Predicate<Integer> inPred=Predicates.in(filterCollection);

        Collection<Integer> filteredCollection=Collections2.filter(new ArrayList<Integer>(),inPred);

            filteredCollection.addAll(toAddCollection);

}

Here are the results I'm getting with the original and optimized versions of the "in" predicate:

// Size Original Optimized

   Size  Original  Optimized
     10  0m1.124s  0m1.096s                                                                                                               
    100  0m1.112s  0m1.088s                                                                                                               
   1000  0m1.164s  0m1.152s                                                                                                               
  10000  0m1.216s  0m1.160s                                                                                                               
 100000  0m9.421s  0m1.228s                                                                                                               
 200000  0m50.611s 0m1.252s                                                                                                               
 500000  > 10 min  0m1.324s    

Regards, Oswaldo.

status Not Defined creator: ozzy...@gmail.com created at: Oct 21, 2014

migrator commented 9 years ago

summary: Not Defined

There's a couple points here: