sviperll / adt4j

adt4j - Algebraic Data Types for Java
BSD 3-Clause "New" or "Revised" License
143 stars 8 forks source link

HashCode improvements #14

Closed jbgi closed 9 years ago

jbgi commented 9 years ago

The default value for valueClassHashCodeBase should probably be a prime number (currently 27). (I suggest using 1000005, since autovalue (from google) use 1000003 and adt4j is strictly superior ;) Also maybe ^ should be used instead of + for a more uniform distribution of hashcodes.

sviperll commented 9 years ago

27 is unfortunate. I has somehow thought that it was prime... Maybe it was planned to be 37, but a typo slipped in.

Nevertheless, I've always thought that + is much better for hash distribution than ^. Xor is used for performance reasons only.

Is there some source explaining origins of 1000003 value in google-auto?

jbgi commented 9 years ago

Yes you are right, + is better than Xor. Did not have time to check out last version of adt4j yet. Does it cache hashcode? (that would be great). I found no explanation of the use of 1000003 as base by google-auto. Probably linked to their use of xor instead of +.

sviperll commented 9 years ago

Optional hashCode caching should be great. It's not implemented, yet.

talios commented 9 years ago

I think I suggested awhile ago maybe generating/calculating the hashcode on creation, since everything is immutable that should be fine?

jbgi commented 9 years ago

But computing the hashcode may be expensive and it may not be used at all for many use cases.

sviperll commented 9 years ago

But caching requires additional field(s), and thread synchronization, is there some established best practice? How immutables.org does it?

jbgi commented 9 years ago

I like the way it is done in https://github.com/google/guava/blob/master/guava/src/com/google/common/collect/SingletonImmutableSet.java Simple, no sync, support serialization out.of-the-box .

jbgi commented 9 years ago

By the way I would also prefer if the equality acceptor would be cached in the same manner (transient, initialized on first use).

sviperll commented 9 years ago

I've released 1.3-beta1 version. I've added some hash code optimizations. hashCodeEvaluation parameter has been added to @GenerateValueClassForVisitor annotation, see RecordVisitor and UserKeyVisitor examples.

I'd like to get some feedback on API and generated code.

talios commented 9 years ago

Looks nice, I see you also added comparable and serializable ( unless they were already there and I didn't spot them before ).

Generated code looks good here.

jbgi commented 9 years ago

Just tried the CACHED mode. The generated code is:

        if (this.hashCodeCachedValue == 0) {
            this.hashCodeCachedValue = this.acceptor.myValueClassHashCode();
            this.hashCodeCachedValue = ((this.hashCodeCachedValue!= 0)?this.hashCodeCachedValue:-2147483648);
        }
        return this.hashCodeCachedValue;

This could lead to a concurrency issue: 0 could been seen as the hascode by a concurrent thread. A version that does not have this problem would be:

        int h = this.hashCodeCachedValue;
        if (h == 0) {
            h = this.acceptor.myValueClassHashCode();
            h =  ((h!= 0)?h:-2147483648);
            this.hashCodeCachedValue = h;
        }
        return h;

Edit: oups sorry, got confused, there is no concurrency issue in the original code. I still think that the alternative implementation is better because it avoid unnecessary field instance access (which are a tiny bit slower than local variable access).

talios commented 9 years ago

@jbgi good spotting, would making hashCodeCachedValue a transient be of use - similar to the SingletonImmutableSet mentioned above?

jbgi commented 9 years ago

In the SYNCRONIZED_CACHED mode cache, maybe synchronize on acceptor instead of dedicated lock, to save one object? But more importantly, maybe it should use double check locking to avoid synchronization on the happy path, ie like:

    private volatile int hashCodeCachedValue = 0;
    public final int hashCode() {
        int h = this.hashCodeCachedValue;
        if (h == 0) {
            synchronized (acceptor) {  // or hashCodeCachedValueLock
              h =  this.hashCodeCachedValue;
              if (h==0) {
                h = this.acceptor.myValueClassHashCode();
                h =  ((h!= 0)?h:-2147483648);
                this.hashCodeCachedValue = h;
             }
           } 
        }
        return h;
    }

Also, when combined with isSerializable=true, the field hashCodeCacheValueLock is made transient so a nullpointer may ensue after deserialization, so synchronization on acceptor maybe necessary in that case.

jbgi commented 9 years ago

Also, there is a small concern with ON_CONSTRUCTION + isSerializable=true. A change in the hashcode function might break code in a subtitle way, as a deserialized instance constructed with an older version of adt4j might have a different hashcode than an equal instance constructed with a newer version of adt4j. This can be resolved by making the field transient and resetting it in the readObject method. ie:

    private transient int hashCodeCachedValue;
    private void readObject(java.io.ObjectInputStream in) throws ClassNotFoundException, IOException {
        in.defaultReadObject();
        this.hashCodeCachedValue = acceptor.myValueClassHashCode();
    }
sviperll commented 9 years ago

@jbgi Thanks a lot for valuable suggestions. I'll try to incorporate everything into next beta/release.

sviperll commented 9 years ago

@talios Comparables and Serializable were there since long ago :)

sviperll commented 9 years ago

I'm still not sure about API: Are these 4 enumeration constants enough? Are names good and clear: MethodEvaluation, ORDINARY, ON_CONSTRUCTION, CACHED, SYNCHRONIZED_CACHED?

jbgi commented 9 years ago

I could suggest the following change that appears clearer to me: hashCodeEvaluation -> hashCodeCaching MethodEvaluation -> HashCodeCaching or HashCodeCachingStrategy ORDINARY -> NOT_CACHED or NONE (default) CACHED -> LAZY SYNCHRONIZED_CACHED -> SYNCHRONIZED_LAZY

sviperll commented 9 years ago

I still think that "evaluation" is better: what is "lazy caching"? "Cached" seems better to me than "lazy" since only "synchronized lazy" is truely lazy :) "Cached" is more neutral term.

I can suggest:

hashCodeEvaluation -> hashCodeCaching ORDINARY -> NONE ON_CONSTRUCTION -> PRECOMPUTE CACHED -> SIMPLE SYNCHRONIZED_CACHED -> SYNCHRONIZED

jbgi commented 9 years ago

Reflecting on the terms and meaning, you're right, evaluation is good. Maybe the only change would be for ORDINARY, to something more explicit, like ALWAYS.

jbgi commented 9 years ago

closing as 1.3 has all features one could ever need regarding hashcode implementation strategy! Impressive work, once again!

sviperll commented 9 years ago

Thank you!

talios commented 9 years ago

Another fine release! Kudos.