valkey-io / valkey

A flexible distributed key-value datastore that supports both caching and beyond caching workloads.
https://valkey.io
Other
16.33k stars 608 forks source link

Benchmark results for 8.0.0-rc2 vs 7.2.6 using single thread #1026

Open roshkhatri opened 2 weeks ago

roshkhatri commented 2 weeks ago

The Benchmarking Setup:

r6g.metal(ARM) Spec:

Command to start the server:

taskset -c 0-1 src/valkey-server --daemonize yes --maxmemory-policy allkeys-lru --appendonly no --cluster-enabled yes --logfile valkey_log_cluster_yes --save ''

Benchmark command:

src/valkey-benchmark -P 10 -r 10000000 -n 10000000 -d 16 -t RPUSH --csv

Method of Running:

So I have implemented a script to run all the command test on the metal instance in all combinations of pipelining, data sizes, commands, cluster modes, and TLS modes.

I have created a public repo valkey-benchmark-tools where I have this script run_benchmark_tool.py which you can go can see how it is running all this.

Results:

Valkey 7.2.6 vs 8.0.0-rc2 Benchmark Results Comparison

Valkey Benchmark Results Comparison

Standalone

Command Pipeline Data Size RPS - 7.2.6 RPS - 8.0.0-rc2 RPS Gain (%) 7.2.6 Latency (ms) 8.0.0-rc2 Latency (ms) Latency Gain (%)
SET 10 16 520787.93 505968.72 -2.85 0.88 0.91 -3.41
GET 10 16 663597.02 634799.00 -4.34 0.67 0.71 -5.97
RPUSH 10 16 765091.63 718048.75 -6.15 0.58 0.62 -6.90
LPUSH 10 16 691086.21 646493.41 -6.45 0.65 0.70 -7.69
LPOP 10 16 626372.19 606277.10 -3.21 0.72 0.75 -4.17
SADD 10 16 557077.81 542960.29 -2.53 0.82 0.85 -3.66
SPOP 10 16 468979.92 457270.35 -2.50 0.99 1.02 -3.03
HSET 10 16 505983.73 498492.24 -1.48 0.91 0.93 -2.20
SET 1 16 102835.67 107983.24 5.01 0.26 0.25 3.85
GET 1 16 103339.23 107540.84 4.07 0.25 0.24 4.00
RPUSH 1 16 104093.64 108852.13 4.57 0.25 0.24 4.00
LPUSH 1 16 104945.78 109997.45 4.81 0.25 0.24 4.00
LPOP 1 16 104142.95 110328.87 5.94 0.25 0.24 4.00
SADD 1 16 105431.96 109726.26 4.07 0.25 0.24 4.00
SPOP 1 16 106727.63 110370.95 3.41 0.25 0.26 -4.00
HSET 1 16 104030.61 108340.23 4.14 0.25 0.24 4.00
SET 10 128 466743.26 457579.49 -1.96 0.99 1.01 -2.02
GET 10 128 615477.77 589124.37 -4.28 0.73 0.77 -5.48
RPUSH 10 128 641502.67 618200.71 -3.63 0.70 0.73 -4.29
LPUSH 10 128 598721.25 567333.81 -5.24 0.76 0.81 -6.58
LPOP 10 128 576679.02 564900.81 -2.04 0.79 0.81 -2.53
SADD 10 128 557887.27 538954.87 -3.39 0.82 0.85 -3.66
SPOP 10 128 469464.63 460238.05 -1.97 0.98 1.02 -4.08
HSET 10 128 466708.66 456335.81 -2.22 0.99 1.02 -3.03
SET 1 128 105515.09 111827.87 5.98 0.25 0.24 4.00
GET 1 128 102821.09 108336.24 5.36 0.25 0.24 4.00
RPUSH 1 128 105790.36 111397.16 5.30 0.25 0.24 4.00
LPUSH 1 128 106097.39 112364.65 5.91 0.25 0.24 4.00
LPOP 1 128 104224.42 109881.77 5.43 0.25 0.24 4.00
SADD 1 128 104524.53 108969.04 4.25 0.25 0.24 4.00
SPOP 1 128 106924.63 111674.24 4.44 0.25 0.26 -4.00
HSET 1 128 107253.25 111413.41 3.88 0.25 0.24 4.00
SET 10 1024 354122.77 351301.54 -0.80 1.31 1.33 -1.53
GET 10 1024 511288.96 492137.15 -3.75 0.88 0.93 -5.68
RPUSH 10 1024 413217.60 398544.29 -3.55 1.12 1.17 -4.46
LPUSH 10 1024 391538.75 377320.77 -3.63 1.18 1.24 -5.08
LPOP 10 1024 440260.07 426259.90 -3.18 1.03 1.07 -3.88
SADD 10 1024 554786.81 533950.96 -3.76 0.82 0.86 -4.88
SPOP 10 1024 472512.19 454272.61 -3.86 0.97 1.03 -6.19
HSET 10 1024 351127.63 336774.59 -4.09 1.33 1.39 -4.51
SET 1 1024 102363.58 107044.26 4.57 0.27 0.26 3.70
GET 1 1024 102148.19 106735.14 4.49 0.26 0.25 3.85
RPUSH 1 1024 103609.69 108210.33 4.44 0.26 0.25 3.85
LPUSH 1 1024 103381.88 108373.43 4.83 0.26 0.25 3.85
LPOP 1 1024 103630.05 107544.11 3.78 0.26 0.25 3.85
SADD 1 1024 104126.90 109349.15 5.02 0.25 0.24 4.00
SPOP 1 1024 106277.70 110131.67 3.63 0.25 0.26 -4.00
HSET 1 1024 104081.01 107663.43 3.44 0.26 0.26 0.00

Standalone Mode with TLS

Command Pipeline Data Size RPS - 7.2.6 RPS - 8.0.0-rc2 RPS Gain (%) 7.2.6 Latency (ms) 8.0.0-rc2 Latency (ms) Latency Gain (%)
SET 10 16 333844.14 335251.78 0.42 1.36 1.36 0.00
GET 10 16 409138.08 397820.49 -2.77 1.09 1.13 -3.67
RPUSH 10 16 440355.07 394037.11 -10.52 1.01 1.15 -13.86
LPUSH 10 16 412354.88 402618.35 -2.36 1.09 1.11 -1.83
LPOP 10 16 386682.09 385166.40 -0.39 1.17 1.17 0.00
SADD 10 16 340948.23 356998.13 4.71 1.34 1.27 5.22
SPOP 10 16 312310.34 324894.30 4.03 1.47 1.38 6.12
HSET 10 16 319343.49 332794.27 4.21 1.44 1.37 4.86
SET 1 16 59079.31 57298.75 -3.01 0.55 0.77 -40.00
GET 1 16 58463.73 60759.02 3.93 0.45 0.59 -31.11
RPUSH 1 16 61825.28 62804.14 1.58 0.46 0.64 -39.13
LPUSH 1 16 63947.52 63543.90 -0.63 0.44 0.63 -43.18
LPOP 1 16 64138.56 61423.29 -4.23 0.47 0.64 -36.17
SADD 1 16 61655.58 62954.17 2.11 0.45 0.61 -35.56
SPOP 1 16 59472.08 58491.92 -1.65 0.52 0.61 -17.31
HSET 1 16 61154.36 60526.71 -1.03 0.47 0.57 -21.28
SET 10 128 287785.22 303468.48 5.45 1.59 1.51 5.03
GET 10 128 361000.56 370197.89 2.55 1.24 1.22 1.61
RPUSH 10 128 375252.85 371363.04 -1.04 1.19 1.21 -1.68
LPUSH 10 128 361177.97 336036.13 -6.96 1.25 1.35 -8.00
LPOP 10 128 357551.20 336582.16 -5.86 1.25 1.35 -8.00
SADD 10 128 364685.44 340754.58 -6.56 1.24 1.34 -8.06
SPOP 10 128 329896.81 314924.60 -4.54 1.36 1.47 -8.09
HSET 10 128 311473.11 296406.08 -4.84 1.47 1.55 -5.44
SET 1 128 60695.59 58455.42 -3.69 0.60 0.58 3.33
GET 1 128 61753.61 62683.66 1.51 0.45 0.51 -13.33
RPUSH 1 128 61310.50 60089.05 -1.99 0.55 0.59 -7.27
LPUSH 1 128 59824.64 57784.35 -3.41 0.56 0.72 -28.57
LPOP 1 128 58004.33 57670.03 -0.58 0.65 0.73 -12.31
SADD 1 128 61096.64 58572.15 -4.13 0.48 0.72 -50.00
SPOP 1 128 60252.71 57100.17 -5.23 0.60 0.76 -26.67
HSET 1 128 59143.82 55151.14 -6.75 0.56 0.74 -32.14
SET 10 1024 224058.52 218616.27 -2.43 2.03 2.10 -3.45
GET 10 1024 312794.91 295876.14 -5.41 1.44 1.54 -6.94
RPUSH 10 1024 244957.64 230560.48 -5.88 1.85 1.98 -7.03
LPUSH 10 1024 239408.06 228121.59 -4.71 1.89 2.00 -5.82
LPOP 10 1024 268841.62 258804.65 -3.73 1.69 1.76 -4.14
SADD 10 1024 361663.26 367693.43 1.67 1.25 1.23 1.60
SPOP 10 1024 329380.45 333552.55 1.27 1.34 1.35 -0.75
HSET 10 1024 215813.31 219961.03 1.92 2.11 2.08 1.42
SET 1 1024 53677.15 54241.97 1.05 0.69 0.73 -5.80
GET 1 1024 57928.91 60441.84 4.34 0.49 0.57 -16.33
RPUSH 1 1024 54961.96 51547.94 -6.21 0.72 0.85 -18.06
LPUSH 1 1024 51790.09 55279.52 6.74 0.75 0.72 4.00
LPOP 1 1024 54129.14 54985.74 1.58 0.73 0.71 2.74
SADD 1 1024 60152.75 60731.44 0.96 0.55 0.67 -21.82
SPOP 1 1024 59925.80 57720.05 -3.68 0.61 0.65 -6.56
HSET 1 1024 52469.69 51656.11 -1.55 0.75 0.86 -14.67

Cluster Mode

Command Pipeline Data Size RPS - 7.2.6 RPS - 8.0.0-rc2 RPS Gain (%) 7.2.6 Latency (ms) 8.0.0-rc2 Latency (ms) Latency Gain (%)
SET 10 16 408835.18 431790.02 5.61 1.14 1.08 5.26
GET 10 16 534478.75 530798.23 -0.69 0.86 0.87 -1.16
RPUSH 10 16 692426.92 651753.79 -5.87 0.65 0.70 -7.69
LPUSH 10 16 634147.25 604553.39 -4.67 0.71 0.76 -7.04
LPOP 10 16 584777.33 570103.02 -2.51 0.78 0.81 -3.85
SADD 10 16 517352.67 512354.43 -0.97 0.89 0.90 -1.12
SPOP 10 16 445323.84 440852.13 -1.00 1.05 1.07 -1.90
HSET 10 16 473851.59 470567.35 -0.69 0.98 0.99 -1.02
SET 1 16 104280.60 108150.46 3.71 0.26 0.25 3.85
GET 1 16 103894.17 109010.28 4.92 0.25 0.24 4.00
RPUSH 1 16 105272.08 109022.68 3.56 0.25 0.24 4.00
LPUSH 1 16 105750.83 110299.90 4.30 0.25 0.24 4.00
LPOP 1 16 105765.74 111251.40 5.19 0.25 0.24 4.00
SADD 1 16 105211.00 108728.85 3.34 0.25 0.24 4.00
SPOP 1 16 106999.10 111051.13 3.79 0.25 0.26 -4.00
HSET 1 16 104888.44 108528.15 3.47 0.25 0.25 0.00
SET 10 128 376152.08 388723.09 3.34 1.25 1.21 3.20
GET 10 128 500026.21 487824.22 -2.44 0.92 0.95 -3.26
RPUSH 10 128 602181.06 575038.98 -4.51 0.75 0.79 -5.33
LPUSH 10 128 550251.46 530409.64 -3.61 0.83 0.87 -4.82
LPOP 10 128 537709.00 531129.04 -1.22 0.85 0.87 -2.35
SADD 10 128 522845.88 511379.14 -2.19 0.88 0.90 -2.27
SPOP 10 128 447715.37 440627.86 -1.58 1.04 1.07 -2.88
HSET 10 128 440146.24 438948.59 -0.27 1.05 1.06 -0.95
SET 1 128 106931.69 112118.05 4.85 0.26 0.25 3.85
GET 1 128 104078.22 109585.00 5.29 0.25 0.24 4.00
RPUSH 1 128 106635.92 112023.98 5.05 0.24 0.24 0.00
LPUSH 1 128 106086.25 112176.05 5.74 0.25 0.24 4.00
LPOP 1 128 106214.66 111205.21 4.70 0.25 0.24 4.00
SADD 1 128 104913.48 109443.52 4.32 0.25 0.24 4.00
SPOP 1 128 106603.38 110791.49 3.93 0.26 0.26 0.00
HSET 1 128 107373.40 112215.01 4.51 0.25 0.24 4.00
SET 10 1024 297329.25 306335.17 3.03 1.58 1.54 2.53
GET 10 1024 426836.67 417672.55 -2.15 1.08 1.11 -2.78
RPUSH 10 1024 386338.19 378263.39 -2.09 1.20 1.23 -2.50
LPUSH 10 1024 370358.83 358255.69 -3.27 1.26 1.31 -3.97
LPOP 10 1024 417917.26 405956.53 -2.86 1.09 1.13 -3.67
SADD 10 1024 512245.84 508324.55 -0.77 0.90 0.91 -1.11
SPOP 10 1024 439193.28 433391.85 -1.32 1.07 1.08 -0.93
HSET 10 1024 330002.99 333253.50 0.98 1.42 1.41 0.70
SET 1 1024 102664.82 107183.35 4.40 0.30 0.29 3.33
GET 1 1024 104243.52 108489.50 4.07 0.25 0.25 0.00
RPUSH 1 1024 103781.94 108647.22 4.69 0.26 0.26 0.00
LPUSH 1 1024 103473.09 108378.70 4.74 0.26 0.26 0.00
LPOP 1 1024 103623.02 108513.67 4.72 0.26 0.25 3.85
SADD 1 1024 104585.17 109857.02 5.04 0.25 0.24 4.00
SPOP 1 1024 106798.79 110798.65 3.75 0.26 0.26 0.00
HSET 1 1024 104363.68 107995.98 3.48 0.27 0.27 0.00

Cluster Mode with TLS

Command Pipeline Data Size RPS - 7.2.6 RPS - 8.0.0-rc2 RPS Gain (%) 7.2.6 Latency (ms) 8.0.0-rc2 Latency (ms) Latency Gain (%)
SET 10 16 298540.59 301242.02 0.90 1.54 1.52 1.30
GET 10 16 372522.05 356490.26 -4.30 1.21 1.27 -4.96
RPUSH 10 16 397066.28 392204.82 -1.22 1.14 1.15 -0.88
LPUSH 10 16 370714.62 374046.09 0.90 1.22 1.22 0.00
LPOP 10 16 347859.72 356990.71 2.62 1.31 1.27 3.05
SADD 10 16 311863.05 317904.60 1.94 1.47 1.44 2.04
SPOP 10 16 288423.31 293785.60 1.86 1.62 1.58 2.47
HSET 10 16 292815.22 298067.78 1.79 1.58 1.55 1.90
SET 1 16 59717.14 56623.07 -5.18 0.71 0.70 1.41
GET 1 16 62626.92 59644.94 -4.76 0.51 0.65 -27.45
RPUSH 1 16 59151.80 62134.16 5.04 0.66 0.48 27.27
LPUSH 1 16 61991.43 64168.68 3.51 0.53 0.57 -7.55
LPOP 1 16 61222.35 60737.96 -0.79 0.55 0.59 -7.27
SADD 1 16 58780.20 61128.12 3.99 0.54 0.61 -12.96
SPOP 1 16 57392.43 59753.32 4.11 0.57 0.61 -7.02
HSET 1 16 60282.71 59184.87 -1.82 0.55 0.63 -14.55
SET 10 128 252396.99 265220.58 5.08 1.83 1.75 4.37
GET 10 128 324276.79 320101.35 -1.29 1.41 1.43 -1.42
RPUSH 10 128 352979.32 336149.13 -4.77 1.28 1.35 -5.47
LPUSH 10 128 327548.32 324658.86 -0.88 1.38 1.41 -2.17
LPOP 10 128 321348.33 321286.22 -0.02 1.42 1.43 -0.70
SADD 10 128 324822.78 320048.62 -1.47 1.41 1.44 -2.13
SPOP 10 128 299217.00 293176.54 -2.02 1.55 1.58 -1.94
HSET 10 128 280954.32 277699.10 -1.16 1.64 1.67 -1.83
SET 1 128 59659.27 54734.93 -8.25 0.58 0.81 -39.66
GET 1 128 60586.05 59889.17 -1.15 0.44 0.68 -54.55
RPUSH 1 128 61115.82 59892.65 -2.00 0.64 0.56 12.50
LPUSH 1 128 59576.62 53981.26 -9.39 0.58 0.83 -43.10
LPOP 1 128 60947.10 52898.20 -13.21 0.66 0.85 -28.79
SADD 1 128 61202.07 59810.70 -2.27 0.60 0.59 1.67
SPOP 1 128 58764.94 58914.81 0.26 0.64 0.62 3.13
HSET 1 128 58038.58 58460.14 0.73 0.73 0.66 9.59
SET 10 1024 195139.27 195049.64 -0.05 2.36 2.37 -0.42
GET 10 1024 275468.32 258323.50 -6.22 1.66 1.78 -7.23
RPUSH 10 1024 223672.28 226150.63 1.11 2.05 2.02 1.46
LPUSH 10 1024 215653.87 217500.82 0.86 2.13 2.11 0.94
LPOP 10 1024 239075.47 241007.32 0.81 1.92 1.91 0.52
SADD 10 1024 316524.90 355055.98 12.17 1.45 1.28 11.72
SPOP 10 1024 289532.76 322474.84 11.38 1.60 1.42 11.25
HSET 10 1024 198350.52 211361.37 6.56 2.32 2.17 6.47
SET 1 1024 53453.91 56363.61 5.44 0.79 0.75 5.06
GET 1 1024 60033.77 57889.40 -3.57 0.57 0.57 0.00
RPUSH 1 1024 58063.99 53177.02 -8.42 0.68 0.84 -23.53
LPUSH 1 1024 57109.08 53156.68 -6.92 0.69 0.74 -7.25
LPOP 1 1024 55741.13 54037.78 -3.06 0.61 0.73 -19.67
SADD 1 1024 59636.39 59005.26 -1.06 0.55 0.62 -12.73
SPOP 1 1024 58915.08 60629.87 2.91 0.64 0.63 1.56
HSET 1 1024 52276.07 53743.56 2.81 0.84 0.78 7.14

Also, let me know if you find any gaps in the script mentioned above and if I might be missing something.

madolson commented 2 weeks ago

@valkey-io/core-team @valkey-io/contributors Worth reviewing to see if anything here seems suspicious in preparation for the 8.0 launch.

PingXie commented 2 weeks ago
  1. can we include a link to the r6g metal spec so the reader doesn't need to search it up?
  2. can we make it clear that this is single-thread in the issue title and at the top of the results? right now, it is bit hidden.
  3. can we clarify that the test client also ran on the same box? (or not?)
  4. Is cluster mode a single shard/primary setup?
  5. the TLS results seem to come from unstable. is this expected? shouldn't it be 8.0 rc2 as well?
  6. RPUSH/LPUSH on standalone TLS (unstable) stand out quite a bit. Do we know why? Note that the cluster mode TLS numbers show a different pattern.
zuiderkwast commented 2 weeks ago

Is valkey-benchmark and valkey-server running on the same machine?

Does valkey-benchmark run without --threads? If yes, then valkey-benchmark may be the bottleneck itself.

When I run valkey-server and valkey-benchmark locally and check top while they are running, then I see valkey-benchmark is at 100% CPU but valkey-server is only at 90% CPU. It means valkey-benchmark can't send enough traffic. So I run valkey-benchmark with --threads 2.

aiven-sal commented 2 weeks ago

Nice! I have 2 questions: 1) have you considered adding median and percentiles (even just a 5%-95% range) in the report? It would make it easier to understand if the differences we see are significant or just noise. 2) do you plan to also run benchmarks for x86? I think that a lot of people are still using x86 and they may be looking specifically for benchmarks on that arch (even if in the end the results are similar)

roshkhatri commented 2 weeks ago

@PingXie

the TLS results seem to come from unstable. is this expected? shouldn't it be 8.0 rc2 as well?

The tests are still running, I did not parallelize the setup. So its just taking time. I will update the results as they are generated. EDIT: updated the top comment with Standalone Mode with TLS and Cluster Mode with TLS

RPUSH/LPUSH on standalone TLS (unstable) stand out quite a bit. Do we know why? Note that the cluster mode TLS numbers show a different pattern.

Not yet.


@zuiderkwast

Is valkey-benchmark and valkey-server running on the same machine?

Yes

Does valkey-benchmark run without --threads? If yes, then valkey-benchmark may be the bottleneck itself.

I dont see it on my system though, I have checked multiple times, valkey-server was mostly at 100 and valkey-benchmark stays mostly around 80-97 but never cross valkey-server. Screenshot 2024-09-13 at 9 30 29 AM


@aiven-sal I will surely look for adding median and percentiles to the results.

do you plan to also run benchmarks for x86?

I have that setup as well. I can add those too.

hwware commented 2 weeks ago
  1. Agree with Ping, Could you please explicitly these results based on single thread?
  2. Could you tell us how many keys in the test?
  3. In the config, you set maxmemory-policy allkeys-lru, but there is no maxmemory setting and ttl for keys, do we need maxmemory-policy parameter? Is there key eviction during test?
  4. Why valkey-benchmark and valkey-server run the same machine? Can I understand the effect of network bandwidth and delay is not included in the result?