yf0994 / guava-libraries

Automatically exported from code.google.com/p/guava-libraries
Apache License 2.0
0 stars 0 forks source link

Detect non-distinct hash codes in EqualsTester #1162

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Just saw your Google+ post on EqualsTester. Nice utility class for unit tests. 
Currently, the Tester does not raise alarm if each tested object returns the 
exact same hash code. This is not strictly necessary, but obviously recommended 
for better performance. 

I was wondering if it would be possible to add such an option to EqualsTester, 
e. g.

new EqualsTester()
   .distinctHashCode()
   .addEqualityGroup(...)

or via a constuctor parameter?

Original issue reported on code.google.com by toellr...@gmail.com on 6 Oct 2012 at 3:41

GoogleCodeExporter commented 9 years ago
Thanks.  We discussed this internally at one point, and there was some 
interest.  The main question was whether it would be opt-in or opt-out.  We 
were hoping to do some testing to see how many Google tests would fail if it 
were opt-out.  If it's not many, we might go with that.  Or we could go a step 
further and turn it on unconditionally, figuring that other users can just not 
use EqualsTester.  That seems a bit cruel, but it would be in service of a 
second goal -- removing testEquals() so that users need only call 
addEqualityGroup().  If we were to do that, then addEqualityGroup() (possibly 
renamed?) would perform the tests each time it is called, and we'd want to be 
sure that any configuration (like distinctHashCode()) was called before 
addEqualityGroup() was called.  But that's not a requirement that users would 
reasonably expect.  So maybe distinctHashCode would be a constructor parameter?

We'll probably try something sometime.  In the meantime, additional thoughts 
are welcome.

Original comment by cpov...@google.com on 8 Oct 2012 at 8:42

GoogleCodeExporter commented 9 years ago
This seems different than the goals of EqualsTester.

hashCode() collisions are perfectly valid.

It still might be an interesting utility to have though.

Original comment by kak@google.com on 22 Aug 2013 at 10:30

GoogleCodeExporter commented 9 years ago
OK, I added a check for distinct hashcodes, and I ran most of the EqualsTester 
tests in the Google repository. I got a few hundred failures. I looked at ~30. 
I did find a few bugs (equals() methods that used approximate equality) and 
some subpar hashCode() methods (clearly intentional in at least one case; less 
clear elsewhere). But the vast majority were legitimate collisions from fairly 
reasonable hash functions. It was actually kind of fascinating:

- a class with a nullable string field that hardcodes 0 for null and uses 
string.hashCode() otherwise... gettingn 0 for "", as well
- a few instances of testing incompatible classes with the same fields against 
one another (UserId=foo, GroupId=foo), where both have the obvious hashCode 
implementation
- the above, but one class is a subclass of the other
- a few people who are adding both foo and foo.hashCode() to the EqualsTester
set 1+4 2+3
- the above, but with toString() instead of hashCode()
- tests using Long values of 0 and -1, whose hash codes collide
- (double) pairs using longBitsForDouble, which seems likely to collide when 
using x and -x
- classes that use ^ on their inputs to generate hash codes in combination with 
inputs whose hash codes match (either x^x vs. y^y or x^y vs. y^x, sometimes 
from autogenerated ClassSanityTester proxies)
- the random collisions that are bound to occur

Given all that, I think there will be better ways to find bad 
equals()/hashCode() methods, e.g., 
https://code.google.com/p/error-prone/issues/detail?id=35

Original comment by cpov...@google.com on 23 Aug 2013 at 7:27

GoogleCodeExporter commented 9 years ago
I'd say that using plain xor in hash is an error most of the time, or at least 
causes problems later, see e.g.,
https://groups.google.com/forum/#!topic/guava-discuss/fQwn4GSwC0E
http://code.google.com/p/guava-libraries/issues/detail?id=458
both caused by the dumb specification of Map.Entry (using addition would be 
much better).

Most of the cases you mentioned are hash collisions caused by a hash collision 
of a single field. This can be solved by providing two non-colliding instances 
for the field (failing to do so could be an error, maybe).

X.create(a, b1, c).hashCode() != X.create(a, b2, c).hashCode()

whenever b1.hashCode() != b2.hashCode(). While there's no guarantee (there are 
even good functions sometimes failing the test e.g., 
MD5.put(a).put(b).put(c).hashCode().asInt()), the most probable cause is having 
forgotten the field. Agreed that error-prone could do it better.

Original comment by Maaarti...@gmail.com on 25 Aug 2013 at 10:18

GoogleCodeExporter commented 9 years ago
This issue has been migrated to GitHub.

It can be found at https://github.com/google/guava/issues/<id>

Original comment by cgdecker@google.com on 1 Nov 2014 at 4:13

GoogleCodeExporter commented 9 years ago

Original comment by cgdecker@google.com on 3 Nov 2014 at 9:08