tikv / client-java

TiKV Java Client
https://tikv.github.io/client-java/
Apache License 2.0
111 stars 109 forks source link

Inefficient protobuf ByteString comparisons #162

Open plevart opened 3 years ago

plevart commented 3 years ago

Hi, While making changes to allow compilation / running with JDK9+, I noticed there is a utility class FastByteComparisons with static methods to compare two byte[] arrays or segments of arrays. It attempts to use Unsafe to speed-up comparison by retrieving and comparing long (i.e. 8 byte) values at a time. If Unsafe is not available it falls back to a slower pure-java version which retrieves and compares individual byte values. This is all nice for comparing byte[] arrays when arrays are already the values kept by application. But the tikv java client also operates with protobuf ByteString instances and needs to compare such instances among themselves. In code this usually looks like this:

    List<ByteString> keys = ...

    if (!sorted) {
      keys.sort((k1, k2) -> FastByteComparisons.compareTo(k1.toByteArray(), k2.toByteArray()));
    }

The drawback of this approach is that ByteString.toByteArray() creates a copy of the ByteSrting data into a newly allocated byte[] array each time this method is called. This means that faster FastByteComparisons.compareTo is more than offset with the overhead of allocation/GC churn and memory copying. There are at least two alternative ways to compare ByteString instances lexicographically:

    ByteString bs1 = ...
    ByteString bs2 = ...

    // alternative 1
    int cmp1 = bs1.asReadOnlyByteBuffer().compareTo(bs2.asReadOnlyByteBuffer());

    // alternative 2
    int cmp2 = ByteString.unsignedLexicographicalComparator().compare(bs1, bs2);

The 1st alternative uses the ByteString underlying byte[] array wrapped with a read-only HeapByteBuffer which implements Comparable<ByteBuffer>. This performs semantically the same comparison without copying and allocation of new byte[] arrays. JDK JIT is smart enough to see that the resulting HeapByteBuffers do not escape the in-lined method and eliminates even the allocation of the HeapByteBuffer instances, resulting in allocation-free comparisons.

The 2nd alternative uses the ByteString's internal Comparator implementation which also performs allocation-free comparison with same semantics.

I created a JMH benchmark to compare the 3 methods here:

https://gist.github.com/plevart/0e7c4f7b7df6d1de8f54df735ca8bc2d

The results are as follows (when running with JDK8):

https://jmh.morethan.io/?source=https://gist.githubusercontent.com/plevart/08c006ff4ecb22afa0982c4a5cd6f7ab/raw/06093d0a6d7fc578c5d8c90a15bef024127535ba/jmh-result-jdk8.json

And as follows (when running with JDK11):

https://jmh.morethan.io/?source=https://gist.githubusercontent.com/plevart/fb1c21a69a01ab7de3a1e5fb1f1d5f40/raw/1d763b5eaa4209618e802d5d7fe6cf0856bd2083/jmh-result-jdk11.json

As can be seen, the unsignedLexicographicalComparator variant does not produce garbage, but is also very inefficient when comparing ByteString(s) with large common prefix as it uses an Iterator-based iteration over individual bytes. But the asReadOnlyByteBuffer variant looks very promissing. It does not produce garbage and is independent of the size of the ByteString suffix that differs. When comparing ByteStrings with long common prefix, the FastByteComparisons with toByteArray still has an edge, but only when running on JDK 8. When running on JDK 9+ (tested with JDK 11), the ByteBuffer.compareTo() was optimized and uses special CPU support so the asReadOnlyByteBuffer beats the FastByteComparisons/toByteArray in every situation by a factor of up to 30x.

So WDYT?

birdstorm commented 3 years ago

Hi, @plevart

Glad to receive your benchmark for FastByteComparison, which is very detailed! Yes, the extra allocation in the current implementation is seemingly creating an overhead when sorting the byte array. I think the asReadOnlyByteBuffer() is the alternative solution to this problem. Would you like to submit a Pull Request for this issue?