yf0994 / guava-libraries

Automatically exported from code.google.com/p/guava-libraries
Apache License 2.0
0 stars 0 forks source link

performance problem in HashBasedTable #1085

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hi,

I am encountering a performance problem in HashBasedTable.  It appears
in Guava 12.0.1 and in the latest revision.  I attached a test that
exposes this problem and a patch that fixes it.  For this test, the
patch provides a 1397 X speedup.  Note that this test uses a very
small table (300x300), but for medium size tables (e.g., 1000x1000),
the test does not finish even after 30 minutes (I killed the process),
while the patched version finishes in 368 milliseconds.

To run the test, just do:

$ java Test

The output for the un-patched version is:
Time is 65679

The output for the patched version is:
Time is 47

The attached test shows that, for a "HashBasedTable table" object, the
following operation is very slow:

table.values().containsAll(toContain);

"HashBasedTable.values()" returns a "StandardTable.Values" object,
which inherits containsAll() from "java.util.AbstractCollection",
which has quadratic complexity.

I attached two patches.  Both patches override containsAll() and
implement a linear time algorithm.  patchSmall.diff populates a
HashSet eagerly, and patchFull.diff populates a HashSet lazily.
patchFull.diff is faster than patchSmall.diff when containsAll()
returns false after inspecting only a few elements, though in most
cases they are equally fast.  I am including patchSmall.diff just in
case you prefer smaller code.

Note that this problem also exists for the other "Values" classes
(LocalCache.Values, ArrayTable.Values, ImmutableMultimap.Values,
MapMakerInternalMap.Values, AbstractFilteredMap.Values, Maps.Values,
FilteredMultimap.AsMap.Values, FilteredMultimap.Values,
Multimaps.Values, StandardTable.Column.Values).

Is this truly a performance problem, or am I misunderstanding the
intended behavior?  If so, can you please confirm that the patch is
correct?

Thanks,

Adrian

Original issue reported on code.google.com by adi1756...@gmail.com on 25 Jul 2012 at 10:56

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by kevinb@google.com on 25 Jul 2012 at 11:19

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
I honestly think this goes too far, and I think the original behavior is 
appropriate.

As a rule, I _only_ expect containsAll to be linear time on a collection that 
is based on a hash table containing the elements to be checked for containing 
-- e.g. HashSet or HashMultiset, not ArrayList or Map.values().  The JDK set 
that precedent, and I feel totally comfortable following it.

I also strongly suspect there exist classes used only as the values in a map or 
table that don't have a properly implemented hashCode() because they're never 
expected to be used as keys, which this change would break.

In any event, if any users really desire the linear-time behavior that 
constructs a HashSet, they can freely substitute the one-liner 
Sets.newHashSet(table.values()).containsAll(collection);

Original comment by wasserman.louis on 26 Jul 2012 at 9:35

GoogleCodeExporter commented 9 years ago
Hi Louis,

> I think the original behavior is appropriate.

There are two options here.  One, we can have a method that *blocks*
the program even for a 1000x1000 table, and tell programmers "I bet
you missed that, our method was supposed to block your program (and we
knew about it)".  Two, we can solve the problem, and make a slow
method fast.

If we can make a slow method fast, why not do it?  Especially when the
fix is 3 lines of code.

> they can freely substitute the one-liner
> Sets.newHashSet(table.values()).containsAll(collection);

Do you think a programmer thinks of the *complexity* of the method,
each time he *types* a method call?  Coding like this would be really
impossible.  We have bugs because programmers miss the null objects,
let alone complexity.  That's why we have libraries: to make
programmers' life easy, not to test their power of concentration.

Best,

Adrian

Original comment by adi1756...@gmail.com on 26 Jul 2012 at 3:13

GoogleCodeExporter commented 9 years ago
We have a long-standing internal bug to "create a table of our collection 
implementations and their operations showing the big-O performance of each."  
That might help here (though the JDK precedent is probably still more 
important), but it doesn't answer the question "Why not make it faster?"

Well, there's always the potential for problems in weird edge cases.  If anyone 
implements hashCode() wrong, as Louis points out, he would have problems.  He 
is arguably getting what he deserves, but if there's a chance that we're going 
to break his app, we'd better be able to demonstrate a clear improvement over 
what users can do on their own.

Most affected users, as you point out, can call "newHashSet" themselves, and 
their problem is gone.  If the performance impact is severe, they'll likely 
identify this problem and fix quickly.  For some other users, the performance 
impact will be negligible.  For still others, the HashSet version may even 
perform worse.  (For example, my understanding is that object allocation on 
Android is still expensive in comparison to server JVMs.)

Should programmers have to worry about time complexity?  When we can hide it 
from them, we can, but I don't think this is one of those cases.

Original comment by cpov...@google.com on 26 Jul 2012 at 9:21

GoogleCodeExporter commented 9 years ago
Random thought, but should we try to document certain methods and classes with 
their big-O performance? Like

/**
 * This is my class. It does class things
 * 
 * @big-O O(n log n)
 */

At least in a few places this could work as a gentle warning to the developer.

Original comment by emily@soldal.org on 27 Jul 2012 at 6:21

GoogleCodeExporter commented 9 years ago
For "views" like the values() collections, it's much less obvious how to do 
this cleanly.

Original comment by wasserman.louis on 27 Jul 2012 at 3:30

GoogleCodeExporter commented 9 years ago
Hi,

I agree with most discussion in Comment 5 and I would like to add two
points to the discussion.

First, I agree it is easy to find performance problems after they
occur *during production*.  But finding performance problems *during
development and testing* is hard, even if fixing them requires only
one line of code (like in the containsAll() example).  To avoid this
problem and not have programmers worry about time complexity, like
Comment 5 says, we should hide the complexity in the library.

Second, we are missing many serious performance improvements because
we assume potentially wrong hashCode().  Most performance improvements
would require some sort of fast searching, e.g., using a HashSet.  Not
using hashCode() is a huge blocker for such major performance
improvements (sure, we can do minor tweaks without HashSet, or in some
particular cases, there is already a HashSet in the code).  If
developers have wrong hashCode(), they should patch their bugs.  Right
now, we are seriously limiting the opportunities for improvement
because some developers might have bugs in their code.

Personally, I feel Guava is better than JDK because Guava evolves
quicker than JDK.  But limiting most (if not all) serious performance
improvements because of the possibility of having a corner case in
hashCode() looks like a bad trade-off.  It's almost like having a
written contract to never have any major performance improvements.

Best,

Adrian

Original comment by adi1756...@gmail.com on 27 Jul 2012 at 7:11

GoogleCodeExporter commented 9 years ago
I'm not fussed about what happens to people who implement hashCode wrong, or 
whose hashCode changes while in the collection. I'd say let's set that argument 
aside.

So we have a quadratic operation that can be made linear by allocating a bunch 
of temporary objects, is that right?  It's a huge win for your case, but it 
sounds like a loss for *most* usages.  It seems better to me to document the 
complexity of our stuff clearly, and them do the newHashSet themselves when 
they need to.

Original comment by kevinb@google.com on 27 Jul 2012 at 7:24

GoogleCodeExporter commented 9 years ago
I guess I feel that it's just common sense that containsAll() should take time 
proportional to the cost of a single contains(), times the size of the target 
collection.

This also feels like an extreme edge case that people who care about 
asymptotics already expect to be a quadratic-time operation.  I certainly can't 
remember the last time, in any context, when I called values().containsAll().  
Did the OP encounter an actual, real-world case where this caused a performance 
bug?

I have to admit I don't consider this a "major performance improvement" at all, 
compared to e.g. a significant constant-factor improvement to add, remove, or 
contains in a "named" collection like HashMultiset.

Original comment by wasserman.louis on 27 Jul 2012 at 8:34

GoogleCodeExporter commented 9 years ago
Hi Kevin,

> It's a huge win for your case, but it sounds like a loss for *most*
> usages.

I agree with Comment 9 that we should also consider other cases.  The
quantitative data that compares the existing containsAll() code and
the proposed patch is below.

I wrote a test (attached TestWorstScenariosForProposedPatch.java) to
exercise the *best* cases for the existing containsAll(), which are
the worst cases for the proposed Full/Lazy patch.  These cases happen
when containsAll() returns after inspecting only *very small* parts of
table.values() and/or "collection".  I parametrized the test by the
number of elements inspected until containsAll() can return.

These are the results on my machine (ORIGINAL is the existing code,
PROPOSED is the Full/Lazy patch, CALLER_RESPONSIBILITY is the one-line
workaround proposed in Comment 3):

MILLISECONDS when table.values().containsAll(collection)  ORIGINAL   PROPOSED  
CALLER_RESPONSIBILITY
...returns FALSE for the 60th element of collection    :  48.936     4.422     
4.296    
...returns FALSE for the 40th element of collection    :  30.048     4.412     
4.248    
...returns FALSE for the 30th element of collection    :  21.704     4.364     
4.254    
...returns FALSE for the 25th element of collection    :  18.258     4.372     
4.260    
...returns FALSE for the 20th element of collection    :  14.578     4.384     
4.266    
...returns FALSE for the 15th element of collection    :  12.362     4.486     
4.264    
...returns FALSE for the 10th element of collection    :   8.144     4.358     
4.260    
...returns FALSE for the  9th element of collection    :   7.878     4.358     
4.250    
...returns FALSE for the  8th element of collection    :   7.726     4.374     
4.262    
...returns FALSE for the  7th element of collection    :   6.992     4.374     
4.246    
...returns FALSE for the  6th element of collection    :   6.938     4.360     
4.272    
...returns FALSE for the  5th element of collection    :   5.826     4.356     
4.252    
...returns FALSE for the  4th element of collection    :   5.312     4.368     
4.236    
...returns FALSE for the  3th element of collection    :   3.964     4.380     
4.266    
...returns FALSE for the  2th element of collection    :   3.474     4.376     
4.264    
...returns FALSE for the  1th element of collection    :   2.812     4.360     
4.256    
...returns FALSE for the  0th element of collection    :   1.534     4.366     
4.364    
--------------------------------------------------------------------------------
-------------
...returns TRUE for the 60th element of table.values() :   0.260     0.074     
4.276    
...returns TRUE for the 40th element of table.values() :   0.182     0.074     
4.284    
...returns TRUE for the 30th element of table.values() :   0.144     0.074     
4.240    
...returns TRUE for the 25th element of table.values() :   0.112     0.074     
4.288    
...returns TRUE for the 20th element of table.values() :   0.094     0.074     
4.282    
...returns TRUE for the 15th element of table.values() :   0.070     0.076     
4.278    
...returns TRUE for the 10th element of table.values() :   0.054     0.074     
4.268    
...returns TRUE for the  9th element of table.values() :   0.058     0.070     
4.312    
...returns TRUE for the  8th element of table.values() :   0.046     0.072     
4.284    
...returns TRUE for the  7th element of table.values() :   0.040     0.068     
4.282    
...returns TRUE for the  6th element of table.values() :   0.036     0.100     
4.288    
...returns TRUE for the  5th element of table.values() :   0.034     0.072     
4.316    
...returns TRUE for the  4th element of table.values() :   0.030     0.072     
4.304    
...returns TRUE for the  3th element of table.values() :   0.024     0.068     
4.276    
...returns TRUE for the  2th element of table.values() :   0.022     0.096     
4.278    
...returns TRUE for the  1th element of table.values() :   0.018     0.092     
4.284    
...returns TRUE for the  0th element of table.values() :   0.014     0.070     
4.276     
--------------------------------------------------------------------------------
-------------

The above quantitative data shows the following:

(1) the proposed patch is faster than the existing containsAll() in
    the majority of cases when small parts of the collections are
    inspected, and significantly faster for the other cases.

(1.1) when containsAll() returns false, the proposed patch is
    significantly faster than the existing code.  In fact, the
    proposed patch is slower only when containsAll() returns after not
    finding an element with index between 0 and 3 from "collection".
    The very worst case is for index 0, when the patched code builds
    the set but never reuses it.  Even for the element with index 3
    the difference is small: the existing code takes 3.964
    milliseconds, while the proposed patch takes 4.380 milliseconds.
    If we need to guard against the pathological case when you have
    only small collections, we can opt not to build the set for very
    small "collection".  I put this in the patch (but not for the
    experiment) for size 4.

(1.2) when containsAll() returns true, the absolute times are
    negligible, e.g., 0.074 milliseconds for the proposed patch.  The
    proposed patch is slower only when containsAll() returns after
    finding all the "collection" elements in the first 15 elements of
    "table.values()".  Typically, the difference is very small, e.g.,
    0.072 vs. 0.046 milliseconds.  If we need to guard against the
    pathological case with only small "table.values()", we can opt not
    to build the set for very small "table.values()".  I put this in
    the patch (but not for the experiment) for size 16.

(2) the proposed patch is significantly faster than the the one-line
    workaround in Comment 3 for half of the cases (when containsAll()
    returns true) and equally fast for the other half (when
    containsAll() returns false).  So, the one-line workaround not
    only requires the programmer to pay attention to time complexity
    but also results in code slower than necessary.  (The code could
    be made faster by implementing some method for lazy creation of
    sets, say, com.google.common.collect.Sets.newLazyHashSet, though
    this would still require the programmer to pay attention when this
    method needs to be used.)

I am running JDK 1.6.0_24 on Ubuntu 11.10.  If possible, I would like
to see the running times on your machines.  To run the above test,
just do "java TestWorstScenariosForProposedPatch".  Note that the
execution may take some time (3.5 minutes on my machine), because it
generates many data points.

I am also attaching patchFullImproved.java, which has several changes
from the original patchFull.java.

The above numbers are the worst cases for the patch and the best cases
for the existing code, so the common usage will be improved by the
patch, and certainly not slowed down.

Best,

Adrian

Original comment by adi1756...@gmail.com on 31 Jul 2012 at 2:23

Attachments:

GoogleCodeExporter commented 9 years ago
I still fear for GWT and Android performance, which are less convenient to 
measure, and I have vague reservations, but that's not much reason to close 
this.

Original comment by cpov...@google.com on 2 Aug 2012 at 10:54

GoogleCodeExporter commented 9 years ago
I am attaching the benchmark in Comment 11 written with Caliper.  The
results are similar to the ones in Comment 11:

......................................................................
testReturnFalse            benchmark index      us linear runtime
           true             Original    60 49950.0 ==============================
           true             Original    40 30831.9 ==================
           true             Original    30 19968.1 ===========
           true             Original    25 16661.5 ==========
           true             Original    20 14361.2 ========
           true             Original    15 12968.3 =======
           true             Original    10  7568.6 ====
           true             Original     9  8066.8 ====
           true             Original     8  7886.0 ====
           true             Original     7  7359.7 ====
           true             Original     6  7212.8 ====
           true             Original     5  5966.7 ===
           true             Original     4  5392.1 ===
           true             Original     3  3984.6 ==
           true             Original     2  3641.4 ==
           true             Original     1  2990.8 =
           true             Original     0  1600.5 =
           true         ProposedLazy    60  4588.4 ==
           true         ProposedLazy    40  4371.8 ==
           true         ProposedLazy    30  4680.9 ==
           true         ProposedLazy    25  4385.7 ==
           true         ProposedLazy    20  4623.4 ==
           true         ProposedLazy    15  4593.7 ==
           true         ProposedLazy    10  4593.1 ==
           true         ProposedLazy     9  4594.7 ==
           true         ProposedLazy     8  4607.5 ==
           true         ProposedLazy     7  4456.8 ==
           true         ProposedLazy     6  4375.6 ==
           true         ProposedLazy     5  4428.9 ==
           true         ProposedLazy     4  4594.7 ==
           true         ProposedLazy     3  4469.4 ==
           true         ProposedLazy     2  4361.2 ==
           true         ProposedLazy     1  4601.2 ==
           true         ProposedLazy     0  4684.5 ==
           true CallerResponsibility    60  4341.3 ==
           true CallerResponsibility    40  4286.1 ==
           true CallerResponsibility    30  4295.8 ==
           true CallerResponsibility    25  4343.6 ==
           true CallerResponsibility    20  4340.1 ==
           true CallerResponsibility    15  4285.8 ==
           true CallerResponsibility    10  4316.3 ==
           true CallerResponsibility     9  4288.5 ==
           true CallerResponsibility     8  4324.6 ==
           true CallerResponsibility     7  4304.1 ==
           true CallerResponsibility     6  4283.4 ==
           true CallerResponsibility     5  4505.9 ==
           true CallerResponsibility     4  4278.6 ==
           true CallerResponsibility     3  4385.2 ==
           true CallerResponsibility     2  4386.2 ==
           true CallerResponsibility     1  4293.1 ==
           true CallerResponsibility     0  4303.8 ==
          false             Original    60   246.4 =
          false             Original    40   170.7 =
          false             Original    30   113.2 =
          false             Original    25   103.6 =
          false             Original    20    79.5 =
          false             Original    15    68.7 =
          false             Original    10    46.0 =
          false             Original     9    47.5 =
          false             Original     8    44.9 =
          false             Original     7    40.2 =
          false             Original     6    38.0 =
          false             Original     5    33.0 =
          false             Original     4    27.0 =
          false             Original     3    24.6 =
          false             Original     2    20.9 =
          false             Original     1    17.3 =
          false             Original     0    14.1 =
          false         ProposedLazy    60    70.6 =
          false         ProposedLazy    40    70.4 =
          false         ProposedLazy    30    70.2 =
          false         ProposedLazy    25    69.4 =
          false         ProposedLazy    20    69.4 =
          false         ProposedLazy    15    69.1 =
          false         ProposedLazy    10    68.8 =
          false         ProposedLazy     9    69.4 =
          false         ProposedLazy     8    68.1 =
          false         ProposedLazy     7    68.7 =
          false         ProposedLazy     6    69.3 =
          false         ProposedLazy     5    69.5 =
          false         ProposedLazy     4    69.2 =
          false         ProposedLazy     3    69.5 =
          false         ProposedLazy     2    68.8 =
          false         ProposedLazy     1    68.7 =
          false         ProposedLazy     0    69.6 =
          false CallerResponsibility    60  4539.6 ==
          false CallerResponsibility    40  4308.1 ==
          false CallerResponsibility    30  4332.4 ==
          false CallerResponsibility    25  4284.4 ==
          false CallerResponsibility    20  4352.9 ==
          false CallerResponsibility    15  4305.9 ==
          false CallerResponsibility    10  4344.9 ==
          false CallerResponsibility     9  4365.5 ==
          false CallerResponsibility     8  4266.1 ==
          false CallerResponsibility     7  4279.6 ==
          false CallerResponsibility     6  4282.6 ==
          false CallerResponsibility     5  4307.1 ==
          false CallerResponsibility     4  4309.7 ==
          false CallerResponsibility     3  4323.3 ==
          false CallerResponsibility     2  4284.7 ==
          false CallerResponsibility     1  4379.7 ==
          false CallerResponsibility     0  4337.0 ==
......................................................................

Original comment by adi1756...@gmail.com on 25 Sep 2012 at 10:23

Attachments:

GoogleCodeExporter commented 9 years ago
This issue has been migrated to GitHub.

It can be found at https://github.com/google/guava/issues/<id>

Original comment by cgdecker@google.com on 1 Nov 2014 at 4:13

GoogleCodeExporter commented 9 years ago

Original comment by cgdecker@google.com on 1 Nov 2014 at 4:18

GoogleCodeExporter commented 9 years ago

Original comment by cgdecker@google.com on 3 Nov 2014 at 9:08