Open migrator opened 10 years ago
summary: Not Defined
Actually, I posted the root cause, rather than an actual performance bug in the library code.
Here's a less artificial example, where creating Filtered Collections exposes the performance issue in the slow predicate:
public static void testCollections2Filter() {
ArrayList<Integer> filterCollection=new ArrayList<Integer>();
ArrayList<Integer> toAddCollection=new ArrayList<Integer>();
for(int i=0;i<500000;++i) {
filterCollection.add(new Integer(i));
toAddCollection.add(new Integer(i));
}
// Predicate<Integer> inPred=optimizedIn(filterCollection);
Predicate<Integer> inPred=Predicates.in(filterCollection);
Collection<Integer> filteredCollection=Collections2.filter(new ArrayList<Integer>(),inPred);
filteredCollection.addAll(toAddCollection);
}
Here are the results I'm getting with the original and optimized versions of the "in" predicate:
// Size Original Optimized
Size Original Optimized
10 0m1.124s 0m1.096s
100 0m1.112s 0m1.088s
1000 0m1.164s 0m1.152s
10000 0m1.216s 0m1.160s
100000 0m9.421s 0m1.228s
200000 0m50.611s 0m1.252s
500000 > 10 min 0m1.324s
Regards, Oswaldo.
status Not Defined creator: ozzy...@gmail.com created at: Oct 21, 2014
summary: Not Defined
There's a couple points here:
It's possible that it might make sense as a warning or an error in error-prone (http://errorprone.info/) to pass a List to Predicates.in, but we'd have to investigate this -- and that'd be more appropriate as a bug filed against error-prone, rather than Guava.
status Not Defined creator: lowas...@google.com created at: Oct 21 (6 days ago)
Hi, There seems to be a performance bug in the "in" method from the Predicates class.
The problem is that we can generate an In Predicate that has an underlying collection with a slow containment implementation, for example an ArrayList.
The following code exposes the problem:
public static void testPredicateIn() {
__- Notice that we're creating an in Predicate over an ArrayList of 500000 integers. Applying that predicate to the integers between 1 and 500000 takes a few minutes.
I could think of a couple of ways of fixing it:
1) Put a disclaimer in the documentation that the user is advised to pass collections with fast containment methods.
2) Convert slow collections into HashSets, as in the following snippet:
Here are some experimental results with different sizes for the collection predicate for the original implementation and optimizedIn:
Regards, Oswaldo.
relevance: 1