And.calcHashCode being performance Bottlneck

GoogleCodeExporter commented 9 years ago

We make a huge number of queries while processing an http request. These 
queries are unique in nature, that they do not appear again. Their sub queries 
may appear again.

Where does the calcHashCode() of the main AND query being used?

query = and(q1, q2, q3, ... );

We do roughly 50-500 of these for every request, and And.calcHashCode() takes 
up majority of our server time.

Do you have any recommendations regarding minimizing this overhead?

Original issue reported on code.google.com by anita.v...@inmobi.com on 28 Oct 2013 at 4:06

GoogleCodeExporter commented 9 years ago

Hi Atul,

Interesting. 

The original purpose of the eager calculation of hashCode (calcHashCode()) in 
the Query objects, is for StandingQueryIndex, with the expectation that query 
fragments would be reused a lot. All queries, once constructed, are immutable 
so my rationale was that it made sense to cache it for such reuse.

However this all hinges on whether queries will typically be reused, or not - 
and reused with StandingQueryIndex. By reused I mean will the exact query 
objects be cached and used again, and not equivalent query objects constructed 
again. I originally envisaged that queries in many applications would be cached 
and reused against changing datasets. But thinking about it now, I think in 
hindsight it might have been the wrong direction.  

I've accepted this as a bug, and applied a fix in trunk. The fix is to make 
hashCode non-final, nullable, and to calculate it on-demand and not eagerly. 
Once calculated, it will be reused to avoid recalculation. This should provide 
the best of both worlds.

I'll leave this issue open until the next release of CQEngine, which will 
include the fix. In the meantime you could svn checkout and mvn install 
locally. Hopefully this will fix the bottleneck.

Thanks!
Niall

Original comment by ni...@npgall.com on 30 Oct 2013 at 1:32

Changed state: Started

GoogleCodeExporter commented 9 years ago

Hi Niall,
  Thanks for the quick fix. Even though Java Memory model ensures assignment of References to be atomic, it is slightly more expensive to ensure atomicity. (Note that assignments to double/long are not atomic). I recommend, moving to "int" instead of Integer to keep hashCode.

  You can see hashCode() implementation of String.java to see how they use this guarantee of Java Memory Model, to prevent synchronization.

http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/jav
a/lang/String.java#String.hashCode%28%29

Original comment by anita.v...@inmobi.com on 30 Oct 2013 at 9:29

GoogleCodeExporter commented 9 years ago

Thanks Niall, this seem to have fixed our issue in production.

Original comment by anita.v...@inmobi.com on 30 Oct 2013 at 10:58

GoogleCodeExporter commented 9 years ago

Hi Atul,

Great that this fixed the bottleneck.

I made a few tradeoffs in selecting Integer here.

Regarding the java memory model, I don't think the example in 
java.lang.String.hashCode() is availing of atomicity or a guarantee provided by 
the JMM as a way to avoid synchronization. java.lang.String.hash is not 
volatile, which means that writes to it may be atomic, but there is no 
visibility guarantee. If thread A calls hashCode() and hash has not been 
computed, thread A will compute and store it. But if thread B or other threads 
then call hashCode() any time later, they are not guaranteed to see the hash 
value stored by thread A. They might all individually see 0 for an arbitrary 
period of time, and they may each recompute it. Worst case it would be like 
thread-local (actually CPU core-local). There is no coordination between 
threads at all.

I think Sun/Oracle were relying on the idempotence of the hash function here to 
allow multiple threads to recompute hashcode needlessly, rather than a 
guarantee provided by the JMM. I was relying on idempotence in a similar way in 
my use of a non-volatile Integer, knowing that it could be recomputed 
needlessly on multicore systems, but that the hash functions were idempotent. 
I've seen Sun/Oracle use idempotence to avoid volatile reads/writes in a few 
places, so I guess there is some merit in it. Number of recomputes must be rare 
in practice, and the worst case would be bounded by the number of cores in the 
system.

Also in java.lang.String.hashCode(), there is additional weirdness. They seem 
to use 0 as both a hashcode, and a special value to indicate that hashcode has 
not yet been computed. I'm sure they had good justification, but if a 
particular sequence of characters actually computes to hashcode 0, I think then 
java.lang.String will recompute the hashcode every time hashCode() is called on 
those strings. So some instances of java.lang.String might not benefit from 
caching at all. In CQEngine, applications might supply objects where hashcode 0 
is more common, so I can't use 0 as a special value.

I agree that a data type smaller than an object reference will be cheaper. 
Using Integer was lazy :) Before the next release I will add a boolean 
hashCodeComputed flag, and make hashCode an int. This will be cheaper and avoid 
object allocations for Integer. It may further improve performance a bit.

Best regards,
Niall

Original comment by ni...@npgall.com on 30 Oct 2013 at 1:11

GoogleCodeExporter commented 9 years ago

hashCode() in string doesn't make sure two threads up-date simultaneously, it 
only makes sure, that hash code is correct. and if possible avoid 
recomputation. Two threads may compete and end up computing the same value 
again. Also it is possible that recomputation happens whenever the value is 
zero. 

Imagine a 64 bit value in a 32 bit machine. Imagine ab is a long variable.

ab = (1 << 32 + 2)

it will write to location ab, 1 and 2 in two CPU cycles. So if another thread 
is trying to write:

ab = (3 << 32 + 4)

You might end up with:
ab = 1 << 32 + 4
or
ab = 3 << 32 + 2

But Java memory model promises this will NOT HAPPEN for refernces, but MAY 
HAPPEN for double and long. Even though References are 64-bit values.

This is ensured by additional mechanism which is expensive in a 32 bit machine.

Using a boolean and int will not work, because you will be subject to 
code-reordering issues. One will have to either "synchronize/lock", or 
introduce artificial memory barriers (or use volatiles)

http://docs.oracle.com/javase/specs/jls/se7/jls7.pdf

Original comment by anita.v...@inmobi.com on 30 Oct 2013 at 5:12

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

Hi Atul,

Yes I'm familiar with that stuff. I think we are on the same page.

You raise a good point regarding code reordering if I were to introduce a 
separate boolean. Both would then need to be volatile. 

So which do you prefer, to leave it as-is with a non-volatile reference, or to 
have two volatile primitives instead?

EDIT: actually, not sure if both primitives need to be volatile. Maybe only the 
boolean, will confirm.

Original comment by ni...@npgall.com on 30 Oct 2013 at 5:47

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

It won't be any better than non-volatile Integer. The fastest performance will 
be to use String like implementation of using just 'int'.

(non-volatile) int hash;

hashCode() {
  int h = this.hash;

  if (h == 0) {
     h = calcHashCode();
  }

  if (h == 0) {
    h = 1;  // Additional optimization..
  }

  hash = h;
  return h;
}

Original comment by anita.v...@inmobi.com on 30 Oct 2013 at 8:00

GoogleCodeExporter commented 9 years ago

Updated in trunk. I liked your idea of coalescing to a non-zero code, if zero 
is calculated...

    @Override
    public int hashCode() {
        // Lazy calculate and cache hashCode...
        int h = this.hashCode;
        if (h == 0) { // hashCode not cached
            h = calcHashCode();
            if (h == 0) { // 0 is normally a valid hashCode, coalesce with another...
                h = -976419167; // was randomly chosen
            }
            this.hashCode = h; // cache hashCode
        }
        return h;
    }

Original comment by ni...@npgall.com on 6 Nov 2013 at 10:24

GoogleCodeExporter commented 9 years ago

Fixed in release 1.2.4. Many thanks!

Original comment by ni...@npgall.com on 22 Nov 2013 at 10:02

Changed state: Fixed

moremore0812 / cqengine

And.calcHashCode being performance Bottlneck #25