maxmind / MaxMind-DB-Reader-java

Java reader for the MaxMind DB format
https://maxmind.github.io/MaxMind-DB-Reader-java/
Apache License 2.0
114 stars 43 forks source link

Remove unnecessary synchronized #69

Closed spkrka closed 4 years ago

spkrka commented 4 years ago

This synchronized call can cause contention. There is no other code that is synchronized on the BufferHolder The internal buffer never mutates its state (position, limit, and mark)

Thus it should be safe to remove this synchronized call.

coveralls commented 4 years ago

Coverage Status

Coverage remained the same at 90.879% when pulling b0b6a71697f238f1988f1d6adebb6db1ff5d2ec5 on spkrka:patch-1 into 3c3e12fd5414d26cf9c353d7907bd7cf594d96d1 on maxmind:master.

oschwald commented 4 years ago

This has been discussed in #65. As stated there, I am reluctant to remove the synchronize given that duplicate is not documented as thread safe.

Is this actually causing issues for you in real world usage? The synchronize is very limited in scope.

spkrka commented 4 years ago

Yes, we noticed significant blocks on this synchronized using JFR during heavy load.

It does not really matter if duplicate is thread-safe or not, since the state of the bytebuffer is effectively immutable.

spkrka commented 4 years ago

I also created some benchmarks using JMH to compare sync vs non-sync solution: I have both the code and the results in this commit: https://github.com/spkrka/MaxMind-DB-Reader-java/commit/76b3be0f26f4ea8cb3b9a092d294d2281b0b4b74

Benchmarks ran on a 32-core machine with Java 11

Summary of benchmark data:
* Performance is the same when threads <= 4
* Sync version throughput goes down as number of concurrent threads increase >= 32
* Non-sync version throughput stabilizes when increasing threads beyond cores.
* At 32 concurrent threads, non-sync version supports about 40% more throughput
* At 32 concurrent threads, the 95-percentile of latency is:
   60 microseconds per op for non-sync and
  580 microseconds per op for sync
oschwald commented 4 years ago

duplicate makes no guarantees that it will not mutate the state of the original buffer in some way as it runs. This is what I mean by not being documented thread-safe. That said, it seems hard to imagine a reason why any possible implementation would do this.

Given that you see significant lock contention at just 32 threads, perhaps the very small risk of some potential implementation introducing a data race is worthwhile. In the linked issue, the person was using 5000 threads.

spkrka commented 4 years ago

I suppose if you have a read-only-buffer, the java implementation must not be allowed to mutate the content of the buffer - but as you say, that would be a weird thing for the java implementation to do regardless of whether it is read-only or not.

oschwald commented 4 years ago

My concern isn't about the contents of the backing buffer, which will not change. It is about the internal state of the ByteBuffer itself. This state is mutable even on a read-only buffer.

spkrka commented 4 years ago

Yes, I agree on that. I was mostly trying to eliminate the buffer content as a variable in all this.

We also know that once the ByteBuffer is created in the BufferHolder class, its state is never mutated.

I suppose if you want to be extra sure the runtime itself won't mutate it, you could explicitly clean the duplicate like this:

duplicate = buffer.duplicate();
duplicate.clear();
return duplicate;

(I think that's unnecessary though)

What do you think? Do you think this can be merged as is or would you prefer something that explicltly clears it too? I make another PR for that

spkrka commented 4 years ago

See alternative PR: https://github.com/maxmind/MaxMind-DB-Reader-java/pull/72

liljencrantz commented 4 years ago

I like the other version, it should be safe even if a future version mutates the state in the ByteBuffer.

spkrka commented 4 years ago

Any chance one of these changes could be merged (and released, perhaps with patch version bump)?

That would help us avoid setting up an internal fork and I think other users could also benefit from it.

oschwald commented 4 years ago

We will likely merge this or some other change to avoid the lock, but it might be a little while before we merge and release. At the very least, I'd like to review the implementation for the various possible ByteBuffer types we could be holding (as it is an abstract class) to ensure that none of the current implementations do anything concerning.

spkrka commented 4 years ago

I looked through the code and found the following: If you're using memory mode, the ByteBuffer will be of type HeapByteBufferR If you're using memory mapped mode, the ByteBuffer will be of type DirectByteBufferR

The implementation of duplicate() in HeapByteBufferR:

    public ByteBuffer asReadOnlyBuffer() {
        return new HeapByteBufferR(hb,
                                     this.markValue(),
                                     this.position(),
                                     this.limit(),
                                     this.capacity(),
                                     offset);
    }

So all it does is create a new object, referencing the same byte array and the same offset and copying all those int fields (mark, position, limit, capacity). Nothing is mutating the original ByteBuffer

For DirectByteBufferR it is slightly different - duplicate is only defined in DirectByteBuffer, so it will return a DirectByteBuffer object instead.

It looks like this:

    public ByteBuffer duplicate() {
        return new DirectByteBuffer(this,
                                              this.markValue(),
                                              this.position(),
                                              this.limit(),
                                              this.capacity(),
                                              0);
    }

    // And the constructor
    DirectByteBuffer(DirectBuffer db,         // package-private
                               int mark, int pos, int lim, int cap,
                               int off)
    {
        super(mark, pos, lim, cap);
        address = db.address() + off;
        cleaner = null;
        att = db;
    }

There is nothing here that mutates the original ByteBuffer in any way.

oschwald commented 4 years ago

1.4.0 has been released with the synchronized removed. Thanks!

spkrka commented 4 years ago

Thank you!