mkodekar / guava-libraries

Automatically exported from code.google.com/p/guava-libraries
Apache License 2.0
0 stars 0 forks source link

Document that Predicates.in uses the contains() method of the collection #1868

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hi,
There seems to be a performance bug in the "in" method from the Predicates 
class.

The problem is that we can generate an In Predicate that has an underlying 
collection with a slow containment implementation, for example an ArrayList.

The following code exposes the problem:

_________________________________________
public static void testPredicateIn() {

        ArrayList<Integer> predicateCollection=new ArrayList<Integer>();

        for(int i=0;i<500000;++i)
            predicateCollection.add(new Integer(i));

        Predicate<Integer> inPred=Predicates.in(predicateCollection);

        for(int i=0;i<500000;++i) {
            inPred.apply(new Integer(i));
        }

    }

__________________________________________-
Notice that we're creating an in Predicate over an ArrayList of 500000 
integers. 
Applying that predicate to the integers between 1 and 500000 takes a few 
minutes.

I could think of a couple of ways of fixing it:

1) Put a disclaimer in the documentation that the user is advised to pass 
collections with fast containment methods.

2) Convert slow collections into HashSets, as in the following snippet:
_______________________________________________________
    private static <T> Predicate<T> optimizedIn(Collection<? extends T> target) {

        if(target instanceof Set)
            return Predicates.in(target);
        else
            return Predicates.in(new HashSet<T>(target));

    }
____________________________________________________

Here are some experimental results with different sizes for the collection 
predicate for the original implementation and optimizedIn:

      Size      Original     Optimized                                                                                                       
        10      0m0.972s     0m0.996s                                                                                                        
       100      0m1.068s     0m1.004s                                                                                                        
      1000      0m1.572s     0m0.980s                                                                                                        
     10000      0m8.065s     0m1.016s                                                                                                        
    100000      1m5.596s     0m1.052s                                                                                                        
    500000      4m56.423s    0m1.200s               

Regards,
 Oswaldo.

Original issue reported on code.google.com by ozzy...@gmail.com on 21 Oct 2014 at 4:27

GoogleCodeExporter commented 9 years ago
Actually, I posted the root cause, rather than an actual performance bug in the 
library code.

Here's a less artificial example, where creating Filtered Collections exposes 
the performance issue in the slow predicate:

______________________________________________
 public static void testCollections2Filter() {

        ArrayList<Integer> filterCollection=new ArrayList<Integer>();
        ArrayList<Integer> toAddCollection=new ArrayList<Integer>();
        for(int i=0;i<500000;++i) {
            filterCollection.add(new Integer(i));
            toAddCollection.add(new Integer(i));
        }

        //      Predicate<Integer> inPred=optimizedIn(filterCollection);                                                                        
             Predicate<Integer> inPred=Predicates.in(filterCollection);

            Collection<Integer> filteredCollection=Collections2.filter(new ArrayList<Integer>(),inPred);

                filteredCollection.addAll(toAddCollection);

    }
_______________

Here are the results I'm getting with the original and optimized versions of 
the "in" predicate:

 // Size    Original  Optimized                                                                                                              

       Size  Original  Optimized
         10  0m1.124s  0m1.096s                                                                                                               
        100  0m1.112s  0m1.088s                                                                                                               
       1000  0m1.164s  0m1.152s                                                                                                               
      10000  0m1.216s  0m1.160s                                                                                                               
     100000  0m9.421s  0m1.228s                                                                                                               
     200000  0m50.611s 0m1.252s                                                                                                               
     500000  > 10 min  0m1.324s    

Regards,
 Oswaldo.

Original comment by ozzy...@gmail.com on 21 Oct 2014 at 10:39

GoogleCodeExporter commented 9 years ago
There's a couple points here:

  - The documentation of Predicates.in specifies, "It does not defensively copy the collection passed in, so future changes to it will alter the behavior of the predicate."  That's a non-@Beta method contract that we would almost certainly not change, and there's not really any way to satisfy that contract and do anything like what you're proposing.
  - We could document better than the collection's own contains() method is used, and that the performance of the predicate will be equivalent to the performance of the collection's contains() method.
  - For very short lists, it is not always the case that converting to a HashSet will help performance.
  - It's possible that it might make sense as a warning or an error in error-prone (http://errorprone.info/) to pass a List to Predicates.in, but we'd have to investigate this -- and that'd be more appropriate as a bug filed against error-prone, rather than Guava.

Original comment by lowas...@google.com on 21 Oct 2014 at 4:23

GoogleCodeExporter commented 9 years ago
This issue has been migrated to GitHub.

It can be found at https://github.com/google/guava/issues/<issue id>

Original comment by cgdecker@google.com on 1 Nov 2014 at 4:08

GoogleCodeExporter commented 9 years ago

Original comment by cgdecker@google.com on 1 Nov 2014 at 4:17

GoogleCodeExporter commented 9 years ago

Original comment by cgdecker@google.com on 3 Nov 2014 at 9:07