[Scaling Investigation] Validate Client Simulation Accuracy

IanHoang commented 4 months ago

Experiment 1:

This is related to the scale testing RFC. For more details, see the RFC here.

To see other experiments in this analysis, see the META issue.

In this experiment we want to address the following questions:

Do search clients in OSB properly simulate actual clients in a client-server model?
For situations where workers have more than one search client, does OSB still properly simulate clients in a client-server model?

During a test, the Worker Coordinator Actor provisions and coordinates a number of Worker Actors that are responsible for driving requests to the SUT. These Worker Actors are allocated a number of clients to perform steps (also known as tasks or operations in a workload). It’s worth mentioning that the number of Worker Actors is determined by the number of CPU cores or vCPUs that the host running OSB has.

The two tables listed below (Autoscaling Group with OpenSearch Benchmark set to a single client on each EC2 instance and Load Generation Host with OpenSearch Benchmark) are two series of experiments to determine if a single load generation host can simulate the same performance as set of instances that all act as a single independent clients.

To reduce discrepancies, we ensure that experiments in Table 2: Load Generation Host with OpenSearch Benchmark has no more than 1 client assigned per worker actor. This can be seen with how the number of clients is always equal to or less than the number of vCPUs. This would match how each client or instance in the ASG in Table 1: Autoscaling Group with OpenSearch Benchmark always will use only one vCPU (even though they each will have 2 vCPUs).

Table 1: Determine Performance of an Autoscaling Group of N instances of OpenSearch Benchmark where `search_clients = 1`

Autoscaling Group with OpenSearch Benchmark	Clients	Instance Type	Instance Count	vCPUs	Memory (GB)
Round 1	8	c5.large	8	16	32
Round 2	16	c5.large	16	32	64
Round 3	32	c5.large	32	64	128

In the table above, the gradual increase in instance count of the same instance type implies that there is a gradual progression of search clients. Each instance will be running OSB with one search client. When all the instances have finished running OSB, we can use a script to aggregate the results for service time across all instances in the ASG.

Table 2: Determine Performance of a Single Load Generation Host with OpenSearch Benchmark where `search_clients = N`

LG Hosts with OpenSearch Benchmark	Simulated Clients (search_clients:N)	Instance Type	Instance Count	vCPUs	Memory (GB)
Round 1	8	c5.2xlarge	1	8	16
Round 2	16	c5.4xlarge	1	16	32
Round 3	32	c5.9xlarge	1	36	72

In the table above, there will only be a single load generation host.

After running experiments from Table 1 & 2, we should perform a comparison.

Table 3: Load Generation Host with OpenSearch Benchmark where `search_clients = N` & More Clients Per Worker

LG Hosts with OpenSearch Benchmark	Simulated Clients (search_clients:N)	Instance Type	Instance Count	vCPUs	Memory (GB)	Clients Per Worker Actor
Round 1	8	c5.large	1	2	16	4
Round 2	16	c5.large	1	2	16	8
Round 2	32	c5.large	1	2	16	16

Knowing how worker actors can be allocated more than one client, we should also rerun the load generation host with OpenSearch Benchmark but in a way where more clients are allocated to a worker actor, as seen in Table 3: Load Generation Host with OpenSearch Benchmark and More Clients Per Worker. This will confirm if we adding more clients to a worker (running with a smaller instance type where there are less CPU cores) can simulate the same performance where one client is assigned to one worker. In round 1 in the table above, we should expect to see two workers (since there are two vCPUs) with 4 clients each. In round 2 in the table above, we should see two workers with 8 clients each. We can compare them with the results from Table 2 (where we tested the same configurations but kept 1 client per worker). If we see no degradation here, scaling investigation 2 should offer stress the load generation host and help us determine what the max clients allowed per worker is.

Term Query

  {
      "name": "term",
      "operation-type": "search",
      "index": "{{index_name | default('big5')}}",
      "request-timeout": 7200,
      "body": {
      "query": {
          "term": {
          "log.file.path": {
              "value": "/var/log/messages/fuschiashoulder"
          }
          }
      }
      }
  },

The term query above is considered a fast query in the Big5 workload and can be used for our experiment.

Metrics to Analyze

With each round of tests, we’ll be comparing the metrics — such as query throughput and service time — seen in both clients from the ASG and the load generation host. We’ll also be monitoring the resource utilization in the ASGs, load generation host, and the system-under-test. If the system-under-test shows signs of resource bottlenecks, we will scale it out and rerun the numbers to ensure that the test results are not skewed.

Why are we not using latency?

OSB’s definition of latency is slightly different from the colloquial definition of latency. In OSB, when a user specifies a target throughput to achieve with the target-throughput parameter, latency is the service time plus the time that the request spends waiting in the queue. When OSB’s parameter target-throughput is not set, service time and latency are equivalent. The design of this parameter is for users who want to achieve a specific target-throughput, which might be for different reasons such as simulating target-throughput seen in their production clusters. Based on these reasons, for these experiments, we will not be setting target-throughput and the clients (in ASG and OSB) will send the queries as fast as possible. Therefore, we will be primarily focusing on service time as it should be equivalent to latency. For more information, see this article from OSB’s documentation.

IanHoang commented 3 months ago

Set Up Experiment Prerequisites

[x] Set up large OpenSearch cluster: 20 Data Nodes (r5.large), 3 Master Nodes (c5.2xlarge)
[x] Set up Auto Scaling Group with OSB (Table 1)
- [x] Set up AMI with OSB and Big5 installed
- [x] Set up launch template (ensure that commands have tags to denote that these were run from the Auto Scaling Group).
- [x] Test out with test cluster
[x] Set up single load generation host (Table 2 and 3)
[x] Set up metric data store (MDS)
[x] Create script that aggregates results from MDS and produces summary of performance across all instances (or "clients") in Auto Scaling Group

IanHoang commented 1 month ago

Scaling investigation scripts created a few weeks back. They can be found here: https://github.com/IanHoang/scaling-investigation

opensearch-project / opensearch-benchmark