Benchmark results for 8.0.0-rc2 vs 7.2.6 using single thread

roshkhatri commented 2 weeks ago

The Benchmarking Setup:

Instance: r6g.metal(ARM)
Node: 1 (for clustermode: single shard setup)
Clients: 50
Pipelining: 10
Requests: 10000000
Data sizes: 16,128,1024
Commands: SET, GET, RPUSH, LPUSH, LPOP, SADD, SPOP, HSET
TLS: enabled, disabled
Number of runs: 3
valkey-benchmark and valkey-server were run on the same machine.

`r6g.metal`(ARM) Spec:

vCPU: 64
Memory (GiB): 512
Network Bandwidth (Gbps): 25
More info can be found here: https://aws.amazon.com/ec2/instance-types/r6g/

Command to start the server:

taskset -c 0-1 src/valkey-server --daemonize yes --maxmemory-policy allkeys-lru --appendonly no --cluster-enabled yes --logfile valkey_log_cluster_yes --save ''

Benchmark command:

src/valkey-benchmark -P 10 -r 10000000 -n 10000000 -d 16 -t RPUSH --csv

Method of Running:

So I have implemented a script to run all the command test on the metal instance in all combinations of pipelining, data sizes, commands, cluster modes, and TLS modes.

I have created a public repo valkey-benchmark-tools where I have this script run_benchmark_tool.py which you can go can see how it is running all this.

Results:

Valkey 7.2.6 vs 8.0.0-rc2 Benchmark Results Comparison

Valkey Benchmark Results Comparison

Standalone

Command	Pipeline	Data Size	RPS - 7.2.6	RPS - 8.0.0-rc2	RPS Gain (%)	7.2.6 Latency (ms)	8.0.0-rc2 Latency (ms)	Latency Gain (%)
SET	10	16	520787.93	505968.72	-2.85	0.88	0.91	-3.41
GET	10	16	663597.02	634799.00	-4.34	0.67	0.71	-5.97
RPUSH	10	16	765091.63	718048.75	-6.15	0.58	0.62	-6.90
LPUSH	10	16	691086.21	646493.41	-6.45	0.65	0.70	-7.69
LPOP	10	16	626372.19	606277.10	-3.21	0.72	0.75	-4.17
SADD	10	16	557077.81	542960.29	-2.53	0.82	0.85	-3.66
SPOP	10	16	468979.92	457270.35	-2.50	0.99	1.02	-3.03
HSET	10	16	505983.73	498492.24	-1.48	0.91	0.93	-2.20
SET	1	16	102835.67	107983.24	5.01	0.26	0.25	3.85
GET	1	16	103339.23	107540.84	4.07	0.25	0.24	4.00
RPUSH	1	16	104093.64	108852.13	4.57	0.25	0.24	4.00
LPUSH	1	16	104945.78	109997.45	4.81	0.25	0.24	4.00
LPOP	1	16	104142.95	110328.87	5.94	0.25	0.24	4.00
SADD	1	16	105431.96	109726.26	4.07	0.25	0.24	4.00
SPOP	1	16	106727.63	110370.95	3.41	0.25	0.26	-4.00
HSET	1	16	104030.61	108340.23	4.14	0.25	0.24	4.00
SET	10	128	466743.26	457579.49	-1.96	0.99	1.01	-2.02
GET	10	128	615477.77	589124.37	-4.28	0.73	0.77	-5.48
RPUSH	10	128	641502.67	618200.71	-3.63	0.70	0.73	-4.29
LPUSH	10	128	598721.25	567333.81	-5.24	0.76	0.81	-6.58
LPOP	10	128	576679.02	564900.81	-2.04	0.79	0.81	-2.53
SADD	10	128	557887.27	538954.87	-3.39	0.82	0.85	-3.66
SPOP	10	128	469464.63	460238.05	-1.97	0.98	1.02	-4.08
HSET	10	128	466708.66	456335.81	-2.22	0.99	1.02	-3.03
SET	1	128	105515.09	111827.87	5.98	0.25	0.24	4.00
GET	1	128	102821.09	108336.24	5.36	0.25	0.24	4.00
RPUSH	1	128	105790.36	111397.16	5.30	0.25	0.24	4.00
LPUSH	1	128	106097.39	112364.65	5.91	0.25	0.24	4.00
LPOP	1	128	104224.42	109881.77	5.43	0.25	0.24	4.00
SADD	1	128	104524.53	108969.04	4.25	0.25	0.24	4.00
SPOP	1	128	106924.63	111674.24	4.44	0.25	0.26	-4.00
HSET	1	128	107253.25	111413.41	3.88	0.25	0.24	4.00
SET	10	1024	354122.77	351301.54	-0.80	1.31	1.33	-1.53
GET	10	1024	511288.96	492137.15	-3.75	0.88	0.93	-5.68
RPUSH	10	1024	413217.60	398544.29	-3.55	1.12	1.17	-4.46
LPUSH	10	1024	391538.75	377320.77	-3.63	1.18	1.24	-5.08
LPOP	10	1024	440260.07	426259.90	-3.18	1.03	1.07	-3.88
SADD	10	1024	554786.81	533950.96	-3.76	0.82	0.86	-4.88
SPOP	10	1024	472512.19	454272.61	-3.86	0.97	1.03	-6.19
HSET	10	1024	351127.63	336774.59	-4.09	1.33	1.39	-4.51
SET	1	1024	102363.58	107044.26	4.57	0.27	0.26	3.70
GET	1	1024	102148.19	106735.14	4.49	0.26	0.25	3.85
RPUSH	1	1024	103609.69	108210.33	4.44	0.26	0.25	3.85
LPUSH	1	1024	103381.88	108373.43	4.83	0.26	0.25	3.85
LPOP	1	1024	103630.05	107544.11	3.78	0.26	0.25	3.85
SADD	1	1024	104126.90	109349.15	5.02	0.25	0.24	4.00
SPOP	1	1024	106277.70	110131.67	3.63	0.25	0.26	-4.00
HSET	1	1024	104081.01	107663.43	3.44	0.26	0.26	0.00

Standalone Mode with TLS

Command	Pipeline	Data Size	RPS - 7.2.6	RPS - 8.0.0-rc2	RPS Gain (%)	7.2.6 Latency (ms)	8.0.0-rc2 Latency (ms)	Latency Gain (%)
SET	10	16	333844.14	335251.78	0.42	1.36	1.36	0.00
GET	10	16	409138.08	397820.49	-2.77	1.09	1.13	-3.67
RPUSH	10	16	440355.07	394037.11	-10.52	1.01	1.15	-13.86
LPUSH	10	16	412354.88	402618.35	-2.36	1.09	1.11	-1.83
LPOP	10	16	386682.09	385166.40	-0.39	1.17	1.17	0.00
SADD	10	16	340948.23	356998.13	4.71	1.34	1.27	5.22
SPOP	10	16	312310.34	324894.30	4.03	1.47	1.38	6.12
HSET	10	16	319343.49	332794.27	4.21	1.44	1.37	4.86
SET	1	16	59079.31	57298.75	-3.01	0.55	0.77	-40.00
GET	1	16	58463.73	60759.02	3.93	0.45	0.59	-31.11
RPUSH	1	16	61825.28	62804.14	1.58	0.46	0.64	-39.13
LPUSH	1	16	63947.52	63543.90	-0.63	0.44	0.63	-43.18
LPOP	1	16	64138.56	61423.29	-4.23	0.47	0.64	-36.17
SADD	1	16	61655.58	62954.17	2.11	0.45	0.61	-35.56
SPOP	1	16	59472.08	58491.92	-1.65	0.52	0.61	-17.31
HSET	1	16	61154.36	60526.71	-1.03	0.47	0.57	-21.28
SET	10	128	287785.22	303468.48	5.45	1.59	1.51	5.03
GET	10	128	361000.56	370197.89	2.55	1.24	1.22	1.61
RPUSH	10	128	375252.85	371363.04	-1.04	1.19	1.21	-1.68
LPUSH	10	128	361177.97	336036.13	-6.96	1.25	1.35	-8.00
LPOP	10	128	357551.20	336582.16	-5.86	1.25	1.35	-8.00
SADD	10	128	364685.44	340754.58	-6.56	1.24	1.34	-8.06
SPOP	10	128	329896.81	314924.60	-4.54	1.36	1.47	-8.09
HSET	10	128	311473.11	296406.08	-4.84	1.47	1.55	-5.44
SET	1	128	60695.59	58455.42	-3.69	0.60	0.58	3.33
GET	1	128	61753.61	62683.66	1.51	0.45	0.51	-13.33
RPUSH	1	128	61310.50	60089.05	-1.99	0.55	0.59	-7.27
LPUSH	1	128	59824.64	57784.35	-3.41	0.56	0.72	-28.57
LPOP	1	128	58004.33	57670.03	-0.58	0.65	0.73	-12.31
SADD	1	128	61096.64	58572.15	-4.13	0.48	0.72	-50.00
SPOP	1	128	60252.71	57100.17	-5.23	0.60	0.76	-26.67
HSET	1	128	59143.82	55151.14	-6.75	0.56	0.74	-32.14
SET	10	1024	224058.52	218616.27	-2.43	2.03	2.10	-3.45
GET	10	1024	312794.91	295876.14	-5.41	1.44	1.54	-6.94
RPUSH	10	1024	244957.64	230560.48	-5.88	1.85	1.98	-7.03
LPUSH	10	1024	239408.06	228121.59	-4.71	1.89	2.00	-5.82
LPOP	10	1024	268841.62	258804.65	-3.73	1.69	1.76	-4.14
SADD	10	1024	361663.26	367693.43	1.67	1.25	1.23	1.60
SPOP	10	1024	329380.45	333552.55	1.27	1.34	1.35	-0.75
HSET	10	1024	215813.31	219961.03	1.92	2.11	2.08	1.42
SET	1	1024	53677.15	54241.97	1.05	0.69	0.73	-5.80
GET	1	1024	57928.91	60441.84	4.34	0.49	0.57	-16.33
RPUSH	1	1024	54961.96	51547.94	-6.21	0.72	0.85	-18.06
LPUSH	1	1024	51790.09	55279.52	6.74	0.75	0.72	4.00
LPOP	1	1024	54129.14	54985.74	1.58	0.73	0.71	2.74
SADD	1	1024	60152.75	60731.44	0.96	0.55	0.67	-21.82
SPOP	1	1024	59925.80	57720.05	-3.68	0.61	0.65	-6.56
HSET	1	1024	52469.69	51656.11	-1.55	0.75	0.86	-14.67

Cluster Mode

Command	Pipeline	Data Size	RPS - 7.2.6	RPS - 8.0.0-rc2	RPS Gain (%)	7.2.6 Latency (ms)	8.0.0-rc2 Latency (ms)	Latency Gain (%)
SET	10	16	408835.18	431790.02	5.61	1.14	1.08	5.26
GET	10	16	534478.75	530798.23	-0.69	0.86	0.87	-1.16
RPUSH	10	16	692426.92	651753.79	-5.87	0.65	0.70	-7.69
LPUSH	10	16	634147.25	604553.39	-4.67	0.71	0.76	-7.04
LPOP	10	16	584777.33	570103.02	-2.51	0.78	0.81	-3.85
SADD	10	16	517352.67	512354.43	-0.97	0.89	0.90	-1.12
SPOP	10	16	445323.84	440852.13	-1.00	1.05	1.07	-1.90
HSET	10	16	473851.59	470567.35	-0.69	0.98	0.99	-1.02
SET	1	16	104280.60	108150.46	3.71	0.26	0.25	3.85
GET	1	16	103894.17	109010.28	4.92	0.25	0.24	4.00
RPUSH	1	16	105272.08	109022.68	3.56	0.25	0.24	4.00
LPUSH	1	16	105750.83	110299.90	4.30	0.25	0.24	4.00
LPOP	1	16	105765.74	111251.40	5.19	0.25	0.24	4.00
SADD	1	16	105211.00	108728.85	3.34	0.25	0.24	4.00
SPOP	1	16	106999.10	111051.13	3.79	0.25	0.26	-4.00
HSET	1	16	104888.44	108528.15	3.47	0.25	0.25	0.00
SET	10	128	376152.08	388723.09	3.34	1.25	1.21	3.20
GET	10	128	500026.21	487824.22	-2.44	0.92	0.95	-3.26
RPUSH	10	128	602181.06	575038.98	-4.51	0.75	0.79	-5.33
LPUSH	10	128	550251.46	530409.64	-3.61	0.83	0.87	-4.82
LPOP	10	128	537709.00	531129.04	-1.22	0.85	0.87	-2.35
SADD	10	128	522845.88	511379.14	-2.19	0.88	0.90	-2.27
SPOP	10	128	447715.37	440627.86	-1.58	1.04	1.07	-2.88
HSET	10	128	440146.24	438948.59	-0.27	1.05	1.06	-0.95
SET	1	128	106931.69	112118.05	4.85	0.26	0.25	3.85
GET	1	128	104078.22	109585.00	5.29	0.25	0.24	4.00
RPUSH	1	128	106635.92	112023.98	5.05	0.24	0.24	0.00
LPUSH	1	128	106086.25	112176.05	5.74	0.25	0.24	4.00
LPOP	1	128	106214.66	111205.21	4.70	0.25	0.24	4.00
SADD	1	128	104913.48	109443.52	4.32	0.25	0.24	4.00
SPOP	1	128	106603.38	110791.49	3.93	0.26	0.26	0.00
HSET	1	128	107373.40	112215.01	4.51	0.25	0.24	4.00
SET	10	1024	297329.25	306335.17	3.03	1.58	1.54	2.53
GET	10	1024	426836.67	417672.55	-2.15	1.08	1.11	-2.78
RPUSH	10	1024	386338.19	378263.39	-2.09	1.20	1.23	-2.50
LPUSH	10	1024	370358.83	358255.69	-3.27	1.26	1.31	-3.97
LPOP	10	1024	417917.26	405956.53	-2.86	1.09	1.13	-3.67
SADD	10	1024	512245.84	508324.55	-0.77	0.90	0.91	-1.11
SPOP	10	1024	439193.28	433391.85	-1.32	1.07	1.08	-0.93
HSET	10	1024	330002.99	333253.50	0.98	1.42	1.41	0.70
SET	1	1024	102664.82	107183.35	4.40	0.30	0.29	3.33
GET	1	1024	104243.52	108489.50	4.07	0.25	0.25	0.00
RPUSH	1	1024	103781.94	108647.22	4.69	0.26	0.26	0.00
LPUSH	1	1024	103473.09	108378.70	4.74	0.26	0.26	0.00
LPOP	1	1024	103623.02	108513.67	4.72	0.26	0.25	3.85
SADD	1	1024	104585.17	109857.02	5.04	0.25	0.24	4.00
SPOP	1	1024	106798.79	110798.65	3.75	0.26	0.26	0.00
HSET	1	1024	104363.68	107995.98	3.48	0.27	0.27	0.00

Cluster Mode with TLS

Command	Pipeline	Data Size	RPS - 7.2.6	RPS - 8.0.0-rc2	RPS Gain (%)	7.2.6 Latency (ms)	8.0.0-rc2 Latency (ms)	Latency Gain (%)
SET	10	16	298540.59	301242.02	0.90	1.54	1.52	1.30
GET	10	16	372522.05	356490.26	-4.30	1.21	1.27	-4.96
RPUSH	10	16	397066.28	392204.82	-1.22	1.14	1.15	-0.88
LPUSH	10	16	370714.62	374046.09	0.90	1.22	1.22	0.00
LPOP	10	16	347859.72	356990.71	2.62	1.31	1.27	3.05
SADD	10	16	311863.05	317904.60	1.94	1.47	1.44	2.04
SPOP	10	16	288423.31	293785.60	1.86	1.62	1.58	2.47
HSET	10	16	292815.22	298067.78	1.79	1.58	1.55	1.90
SET	1	16	59717.14	56623.07	-5.18	0.71	0.70	1.41
GET	1	16	62626.92	59644.94	-4.76	0.51	0.65	-27.45
RPUSH	1	16	59151.80	62134.16	5.04	0.66	0.48	27.27
LPUSH	1	16	61991.43	64168.68	3.51	0.53	0.57	-7.55
LPOP	1	16	61222.35	60737.96	-0.79	0.55	0.59	-7.27
SADD	1	16	58780.20	61128.12	3.99	0.54	0.61	-12.96
SPOP	1	16	57392.43	59753.32	4.11	0.57	0.61	-7.02
HSET	1	16	60282.71	59184.87	-1.82	0.55	0.63	-14.55
SET	10	128	252396.99	265220.58	5.08	1.83	1.75	4.37
GET	10	128	324276.79	320101.35	-1.29	1.41	1.43	-1.42
RPUSH	10	128	352979.32	336149.13	-4.77	1.28	1.35	-5.47
LPUSH	10	128	327548.32	324658.86	-0.88	1.38	1.41	-2.17
LPOP	10	128	321348.33	321286.22	-0.02	1.42	1.43	-0.70
SADD	10	128	324822.78	320048.62	-1.47	1.41	1.44	-2.13
SPOP	10	128	299217.00	293176.54	-2.02	1.55	1.58	-1.94
HSET	10	128	280954.32	277699.10	-1.16	1.64	1.67	-1.83
SET	1	128	59659.27	54734.93	-8.25	0.58	0.81	-39.66
GET	1	128	60586.05	59889.17	-1.15	0.44	0.68	-54.55
RPUSH	1	128	61115.82	59892.65	-2.00	0.64	0.56	12.50
LPUSH	1	128	59576.62	53981.26	-9.39	0.58	0.83	-43.10
LPOP	1	128	60947.10	52898.20	-13.21	0.66	0.85	-28.79
SADD	1	128	61202.07	59810.70	-2.27	0.60	0.59	1.67
SPOP	1	128	58764.94	58914.81	0.26	0.64	0.62	3.13
HSET	1	128	58038.58	58460.14	0.73	0.73	0.66	9.59
SET	10	1024	195139.27	195049.64	-0.05	2.36	2.37	-0.42
GET	10	1024	275468.32	258323.50	-6.22	1.66	1.78	-7.23
RPUSH	10	1024	223672.28	226150.63	1.11	2.05	2.02	1.46
LPUSH	10	1024	215653.87	217500.82	0.86	2.13	2.11	0.94
LPOP	10	1024	239075.47	241007.32	0.81	1.92	1.91	0.52
SADD	10	1024	316524.90	355055.98	12.17	1.45	1.28	11.72
SPOP	10	1024	289532.76	322474.84	11.38	1.60	1.42	11.25
HSET	10	1024	198350.52	211361.37	6.56	2.32	2.17	6.47
SET	1	1024	53453.91	56363.61	5.44	0.79	0.75	5.06
GET	1	1024	60033.77	57889.40	-3.57	0.57	0.57	0.00
RPUSH	1	1024	58063.99	53177.02	-8.42	0.68	0.84	-23.53
LPUSH	1	1024	57109.08	53156.68	-6.92	0.69	0.74	-7.25
LPOP	1	1024	55741.13	54037.78	-3.06	0.61	0.73	-19.67
SADD	1	1024	59636.39	59005.26	-1.06	0.55	0.62	-12.73
SPOP	1	1024	58915.08	60629.87	2.91	0.64	0.63	1.56
HSET	1	1024	52276.07	53743.56	2.81	0.84	0.78	7.14

Also, let me know if you find any gaps in the script mentioned above and if I might be missing something.

madolson commented 2 weeks ago

@valkey-io/core-team @valkey-io/contributors Worth reviewing to see if anything here seems suspicious in preparation for the 8.0 launch.

PingXie commented 2 weeks ago

can we include a link to the r6g metal spec so the reader doesn't need to search it up?
can we make it clear that this is single-thread in the issue title and at the top of the results? right now, it is bit hidden.
can we clarify that the test client also ran on the same box? (or not?)
Is cluster mode a single shard/primary setup?
the TLS results seem to come from unstable. is this expected? shouldn't it be 8.0 rc2 as well?
RPUSH/LPUSH on standalone TLS (unstable) stand out quite a bit. Do we know why? Note that the cluster mode TLS numbers show a different pattern.

zuiderkwast commented 2 weeks ago

Is valkey-benchmark and valkey-server running on the same machine?

Does valkey-benchmark run without --threads? If yes, then valkey-benchmark may be the bottleneck itself.

When I run valkey-server and valkey-benchmark locally and check top while they are running, then I see valkey-benchmark is at 100% CPU but valkey-server is only at 90% CPU. It means valkey-benchmark can't send enough traffic. So I run valkey-benchmark with --threads 2.

aiven-sal commented 2 weeks ago

Nice! I have 2 questions: 1) have you considered adding median and percentiles (even just a 5%-95% range) in the report? It would make it easier to understand if the differences we see are significant or just noise. 2) do you plan to also run benchmarks for x86? I think that a lot of people are still using x86 and they may be looking specifically for benchmarks on that arch (even if in the end the results are similar)

roshkhatri commented 2 weeks ago

@PingXie

the TLS results seem to come from unstable. is this expected? shouldn't it be 8.0 rc2 as well?

The tests are still running, I did not parallelize the setup. So its just taking time. I will update the results as they are generated. EDIT: updated the top comment with Standalone Mode with TLS and Cluster Mode with TLS

RPUSH/LPUSH on standalone TLS (unstable) stand out quite a bit. Do we know why? Note that the cluster mode TLS numbers show a different pattern.

Not yet.

@zuiderkwast

Is valkey-benchmark and valkey-server running on the same machine?

Yes

Does valkey-benchmark run without --threads? If yes, then valkey-benchmark may be the bottleneck itself.

I dont see it on my system though, I have checked multiple times, valkey-server was mostly at 100 and valkey-benchmark stays mostly around 80-97 but never cross valkey-server. Screenshot 2024-09-13 at 9 30 29 AM

@aiven-sal I will surely look for adding median and percentiles to the results.

do you plan to also run benchmarks for x86?

I have that setup as well. I can add those too.

hwware commented 2 weeks ago

Agree with Ping, Could you please explicitly these results based on single thread?
Could you tell us how many keys in the test?
In the config, you set maxmemory-policy allkeys-lru, but there is no maxmemory setting and ttl for keys, do we need maxmemory-policy parameter? Is there key eviction during test?
Why valkey-benchmark and valkey-server run the same machine? Can I understand the effect of network bandwidth and delay is not included in the result?

valkey-io / valkey