valkey-io / valkey

A flexible distributed key-value datastore that is optimized for caching and other realtime workloads.
https://valkey.io
Other
17.76k stars 673 forks source link

[NEW] Benchmark CLI tool rewrite #900

Open vitarb opened 3 months ago

vitarb commented 3 months ago

Existing benchmark tool is dated. The way it's implemented makes it difficult to introduce new data types or custom workloads into it. Its support for cluster mode is also very limited. I suggest we collect requirements and consider rewriting it from scratch.

Some use cases that I would like to be supported in the future:

This issue is an attempt to gauge interest in this effort from the valkey community and gather additional use cases that hypothetical rewrite should address.

bbarani commented 3 months ago

Some additional requirements to consider:

poiuj commented 3 months ago

Great idea. Recently I needed to benchmark specific cases. I ended up with ad hoc python script that does what was missing for me in the benchmark CLI tool. Specifically I wanted to fill a node with data, run "warm up" to randomize the memory layout, then introduce fragmentation, and only then run specific benchmark with benchmark CLI tool.

Other missing features to consider:

  1. Using different value sizes. Real life workloads rarely use values of the exact same size. The benchmark tool should support this.
  2. Using different distributions over keys range. Today benchmark CLI supports uniform distribution. Which is pretty artificial. To emulate hot keys, we want to use a different distribution.

Bot of the cases could be solved by running several benchmark tools in parallel. But it's rather a workaround.

vitarb commented 3 months ago

could be solved by running several benchmark tools in parallel. But it's rather a workaround

I agree, having more flexibility in ability to define workloads that consist of more than one command configuration would be helpful. This would allow mixed writes, read/write/delete, etc. Combining results from multiple parallel runs can be a little tricky too.

hpatro commented 3 months ago

Also, as the benchmarking tool is single threaded sometimes running one benchmark process is not enough to saturate the server. Hence, we end up running multiple benchmark process. With that it becomes tricky to collate the metrics together for multiple benchmark process.

Along with the tool we also need to determine scenario/actual workload to benchmark against.

madolson commented 3 months ago

Also, as the benchmarking tool is single threaded sometimes running one benchmark process is not enough to saturate the server. Hence, we end up running multiple benchmark process. With that it becomes tricky to collate the metrics together for multiple benchmark process.

The benchmark supports multi-threading today with --threads.

Ralphbow commented 3 months ago

HI 👋

suxb201 commented 3 months ago

Please take a look at https://github.com/tair-opensource/resp-benchmark, which we used for our Continuous Benchmarking and to generate our Performance White Paper.

  1. Use the command ZADD {key sequence 1000} {rand 70000} {key sequence 10007} to easily generate commands.
  2. It has Python bindings that make it easy to write test scripts for monitoring memory usage and other metrics.
  3. The tool also supports connections=0, which picks the right connection to reduce high latency from too many connections. Although the implementation might not be perfect, it works well.
suxb201 commented 3 months ago

Cluster testing is not a must:

  1. Redis cluster performance is necessarily n times that of a single node, so there is no need to test it.
  2. Testing a cluster requires higher resources to generate enough pressure, and usually the network and CPU of an EC2 will be the bottleneck.
vitarb commented 3 months ago

Redis cluster performance is necessarily n times that of a single node, so there is no need to test it.

Unfortunately it's not that simple, if you want to test realistic scenarios, you would also need to load the cluster with some write traffic, which would cause replication, degrading read throughput on the replicas.

Testing a cluster requires higher resources to generate enough pressure, and usually the network and CPU of an EC2 will be the bottleneck.

On typical instance you have 10 Gigabit network (see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-network-bandwidth.html), which is 1.25GB/sec of data (bigger instances allow more), assume 100 bytes per request/response pair, you should be able to handle 12M requests per sec from a single node from pure bandwidth perspective. That should be enough to saturate mid size cluster.