Closed jbgi closed 9 years ago
27 is unfortunate. I has somehow thought that it was prime... Maybe it was planned to be 37, but a typo slipped in.
Nevertheless, I've always thought that + is much better for hash distribution than ^. Xor is used for performance reasons only.
Is there some source explaining origins of 1000003 value in google-auto?
Yes you are right, + is better than Xor. Did not have time to check out last version of adt4j yet. Does it cache hashcode? (that would be great). I found no explanation of the use of 1000003 as base by google-auto. Probably linked to their use of xor instead of +.
Optional hashCode caching should be great. It's not implemented, yet.
I think I suggested awhile ago maybe generating/calculating the hashcode on creation, since everything is immutable that should be fine?
But computing the hashcode may be expensive and it may not be used at all for many use cases.
But caching requires additional field(s), and thread synchronization, is there some established best practice? How immutables.org does it?
I like the way it is done in https://github.com/google/guava/blob/master/guava/src/com/google/common/collect/SingletonImmutableSet.java Simple, no sync, support serialization out.of-the-box .
By the way I would also prefer if the equality acceptor would be cached in the same manner (transient, initialized on first use).
I've released 1.3-beta1 version. I've added some hash code optimizations. hashCodeEvaluation
parameter has been added to @GenerateValueClassForVisitor
annotation, see RecordVisitor
and UserKeyVisitor
examples.
I'd like to get some feedback on API and generated code.
Looks nice, I see you also added comparable and serializable ( unless they were already there and I didn't spot them before ).
Generated code looks good here.
Just tried the CACHED mode. The generated code is:
if (this.hashCodeCachedValue == 0) {
this.hashCodeCachedValue = this.acceptor.myValueClassHashCode();
this.hashCodeCachedValue = ((this.hashCodeCachedValue!= 0)?this.hashCodeCachedValue:-2147483648);
}
return this.hashCodeCachedValue;
This could lead to a concurrency issue: 0 could been seen as the hascode by a concurrent thread. A version that does not have this problem would be:
int h = this.hashCodeCachedValue;
if (h == 0) {
h = this.acceptor.myValueClassHashCode();
h = ((h!= 0)?h:-2147483648);
this.hashCodeCachedValue = h;
}
return h;
Edit: oups sorry, got confused, there is no concurrency issue in the original code. I still think that the alternative implementation is better because it avoid unnecessary field instance access (which are a tiny bit slower than local variable access).
@jbgi good spotting, would making hashCodeCachedValue
a transient
be of use - similar to the SingletonImmutableSet
mentioned above?
In the SYNCRONIZED_CACHED mode cache, maybe synchronize on acceptor instead of dedicated lock, to save one object? But more importantly, maybe it should use double check locking to avoid synchronization on the happy path, ie like:
private volatile int hashCodeCachedValue = 0;
public final int hashCode() {
int h = this.hashCodeCachedValue;
if (h == 0) {
synchronized (acceptor) { // or hashCodeCachedValueLock
h = this.hashCodeCachedValue;
if (h==0) {
h = this.acceptor.myValueClassHashCode();
h = ((h!= 0)?h:-2147483648);
this.hashCodeCachedValue = h;
}
}
}
return h;
}
Also, when combined with isSerializable=true, the field hashCodeCacheValueLock is made transient so a nullpointer may ensue after deserialization, so synchronization on acceptor maybe necessary in that case.
Also, there is a small concern with ON_CONSTRUCTION + isSerializable=true. A change in the hashcode function might break code in a subtitle way, as a deserialized instance constructed with an older version of adt4j might have a different hashcode than an equal instance constructed with a newer version of adt4j. This can be resolved by making the field transient and resetting it in the readObject method. ie:
private transient int hashCodeCachedValue;
private void readObject(java.io.ObjectInputStream in) throws ClassNotFoundException, IOException {
in.defaultReadObject();
this.hashCodeCachedValue = acceptor.myValueClassHashCode();
}
@jbgi Thanks a lot for valuable suggestions. I'll try to incorporate everything into next beta/release.
@talios Comparables and Serializable were there since long ago :)
I'm still not sure about API: Are these 4 enumeration constants enough? Are names good and clear: MethodEvaluation
, ORDINARY
, ON_CONSTRUCTION
, CACHED
, SYNCHRONIZED_CACHED
?
I could suggest the following change that appears clearer to me: hashCodeEvaluation -> hashCodeCaching MethodEvaluation -> HashCodeCaching or HashCodeCachingStrategy ORDINARY -> NOT_CACHED or NONE (default) CACHED -> LAZY SYNCHRONIZED_CACHED -> SYNCHRONIZED_LAZY
I still think that "evaluation" is better: what is "lazy caching"? "Cached" seems better to me than "lazy" since only "synchronized lazy" is truely lazy :) "Cached" is more neutral term.
I can suggest:
hashCodeEvaluation -> hashCodeCaching ORDINARY -> NONE ON_CONSTRUCTION -> PRECOMPUTE CACHED -> SIMPLE SYNCHRONIZED_CACHED -> SYNCHRONIZED
Reflecting on the terms and meaning, you're right, evaluation is good. Maybe the only change would be for ORDINARY, to something more explicit, like ALWAYS.
closing as 1.3 has all features one could ever need regarding hashcode implementation strategy! Impressive work, once again!
Thank you!
Another fine release! Kudos.
The default value for valueClassHashCodeBase should probably be a prime number (currently 27). (I suggest using 1000005, since autovalue (from google) use 1000003 and adt4j is strictly superior ;) Also maybe ^ should be used instead of + for a more uniform distribution of hashcodes.