tarantool / cartridge-java

Tarantool Cartridge Java driver for Tarantool versions 1.10+ based on Netty framework
https://tarantool.io
Other
27 stars 11 forks source link

Add benchmark for cluster API #453

Closed akudiyar closed 7 months ago

akudiyar commented 8 months ago

Before memory optimizations

Benchmark results on commit https://github.com/tarantool/cartridge-java/commit/a6199e8268577517167492191b0c5989fbcf99e8 (before optimizations):

Total time: 00:33:33

Benchmark                                            Mode  Cnt      Score      Error  Units
ClusterBenchmarkRunner.readDataUsingCallAPI         thrpt   10  13761.025 ± 2657.463  ops/s
ClusterBenchmarkRunner.readDataUsingCallAPI:·jfr    thrpt             NaN               ---
ClusterBenchmarkRunner.readDataUsingSpaceAPI        thrpt   10  12033.690 ± 2032.909  ops/s
ClusterBenchmarkRunner.readDataUsingSpaceAPI:·jfr   thrpt             NaN               ---
ClusterBenchmarkRunner.writeDataUsingCallAPI        thrpt   10   6175.897 ±  630.966  ops/s
ClusterBenchmarkRunner.writeDataUsingCallAPI:·jfr   thrpt             NaN               ---
ClusterBenchmarkRunner.writeDataUsingSpaceAPI       thrpt   10   7123.615 ±  894.854  ops/s
ClusterBenchmarkRunner.writeDataUsingSpaceAPI:·jfr  thrpt             NaN               ---

Allocations for ClusterBenchmarkRunner.writeDataUsingCallAPI (before optimizations):

Allocation_before

TLAB allocations for ClusterBenchmarkRunner.writeDataUsingCallAPI (before optimizations):

TLAB_allocation_before

After memory optimizations

Benchmark results on commit https://github.com/tarantool/cartridge-java/pull/439/commits/5f9e69c8d9d0e907e356bb5954c547364fa5207d (after optimizations):

Total time: 00:24:56

Benchmark                                            Mode  Cnt      Score      Error  Units
ClusterBenchmarkRunner.readDataUsingCallAPI         thrpt   10  19674.739 ± 2977.708  ops/s
ClusterBenchmarkRunner.readDataUsingCallAPI:·jfr    thrpt             NaN               ---
ClusterBenchmarkRunner.readDataUsingSpaceAPI        thrpt   10  17092.066 ± 5721.286  ops/s
ClusterBenchmarkRunner.readDataUsingSpaceAPI:·jfr   thrpt             NaN               ---
ClusterBenchmarkRunner.writeDataUsingCallAPI        thrpt   10   8012.627 ± 1120.023  ops/s
ClusterBenchmarkRunner.writeDataUsingCallAPI:·jfr   thrpt             NaN               ---
ClusterBenchmarkRunner.writeDataUsingSpaceAPI       thrpt   10   9157.278 ± 3705.178  ops/s
ClusterBenchmarkRunner.writeDataUsingSpaceAPI:·jfr  thrpt             NaN               ---

Allocations for ClusterBenchmarkRunner.writeDataUsingCallAPI (after optimizations):

Allocation_after

TLAB allocations for ClusterBenchmarkRunner.writeDataUsingCallAPI (after optimizations):

TLAB_allocation_after

Summary

The average throughput gain is about 40% on read operations and 30% on write operations. The benchmark run completes approx. 30% faster with optimizations. The main reason for improvements can be a >50% reduction in allocations.

Benchmark runs were performed with JDK 17, G1 GC (default settings), using the following command:

mvn exec:exec -Pbenchmark -Dbenchmark="ClusterBenchmarkRunner" -DbenchmarkArgs="-prof=jfr"