opensearch-project / opensearch-benchmark

OpenSearch Benchmark - a community driven, open source project to run performance tests for OpenSearch
https://opensearch.org/docs/latest/benchmark/
Apache License 2.0
104 stars 74 forks source link

[RFC]: Scaling-Up Improvements #555

Open IanHoang opened 2 months ago

IanHoang commented 2 months ago

Synopsis

This RFC is intended to address OpenSearch Benchmark’s (OSB) limitations pertaining to operation at large scale. Several users have reported that OSB performance tests do not scale when a large number of client threads is specified. Overcoming this limitation is crucial for the OpenSearch community as it will unblock these users and other stakeholders and potentially lead to the development of new features within OpenSearch. This RFC proposes an investigation into the scaling limitations and subsequent options to overcome them, which may include modifications to OSB’s client architecture.

Motivation

By specifying certain workload parameters, OSB users can alter characteristics of a benchmark. OSB has a clients parameter that allows users to specify the number of parallel threads to perform a task or operation. This parameter can be specified by setting bulk_indexing_clients and search_clients; these simulate the number of clients that issue indexing and search requests respectively. By default, workloads packaged with OSB have bulk_indexing_clients set to 8 and search_clients set to 1 unless the user specifies otherwise. These clients all run in parallel on the load generation host where OSB is installed and invoked. OSB achieves this by leveraging the Thespianpy library, an actor model framework.

When a user wants to increase the intensity of load imposed on the target system-under-test (an OpenSearch cluster or a Serverless implementation), the natural technique is to increase the number of clients by utilizing one of the parameters above. By doing this, users can simulate the traffic patterns seen in their production environments and better understand their cluster’s limitations.

However, many users have reported that OSB encounters scaling limitations when the number of clients is increased beyond a certain level. Such bottlenecks may result from OSB’s client architecture design, such asthis one that details how OSB cannot scale out the number of clients or is unable to achieve certain throughput levels due to design constraints. Users have claimed they can scale up to 16 clients successfully but OSB performance begins to degrade once they go beyond 32 clients. There have also been reports that the current client architecture might not use the load generation host’s resources effectively. To combat these pain points, some users have found makeshift ways to get around these limitations and discovered that such workarounds can lead to better resource utilization and workload performance.

Evidently, even a highly efficient application running on a single load generation host will cease to scale beyond a certain point, when the resources available are all consumed by the workload. At that point, it will become necessary to scale-up by using a beefier instance, or to scale-out by adding additional load-generation hosts that operate in parallel. With regard to the latter, Distributed Workload Generation (DWG) is a feature that comes with OSB and coordinates a group of load generation hosts to drive load to the OpenSearch cluster. This feature, that uses a scaled out number of hosts, is intended for use in the scenario described above. However, the feature has not been thoroughly tested and the exact scenarios of when it should be used is not well understood.

This RFC was inspired by these user pain points and focuses on understanding which OSB components are involved in simulating clients, what the limitations are with these components, and which changes can be made to remove such limitations, thereby making the use of a single load generation host’s resources more efficient.

There will be a separate RFC to address DWG. This RFC proposes that there will be two phases related to scale testing. The first phase will focus on making OSB scale as well as possible and the second focusing on DWG, which will have its own RFC. The first phase will consist of identifying limitations, verifying workarounds, tracking down bottlenecks or causes of the limitations, overcoming bottlenecks, and publishing info on discoveries and actions taken.

This RFC also provides opportunity in determining if OSB should support other language clients for OpenSearch. Since OSB is primarily based in Python, there have been questions on if the Python GIL, which is known to prevent parallelism, or Python’s Async IO library limits OSB’s scalability in search clients. If Python is a limitation, OSB might need to be rearchitected to become more modular and be able to use other OpenSearch language clients (such as Go, Rust, and Java).

Areas of Interest

Since this RFC is focused enhancing OSB performance at scale, our areas of interest will be on the Worker Coordinator Actor and the Worker Actor(s) since they are primarily involved in scaling out clients which consume the load generation host’s resources. Specifically, we’re interested in analyzing the components’ code, stress testing them, and seeing how they perform under various conditions to identify any shortcomings. For more information on why we chose these areas of interest, see the Appendix.

Stakeholders

Proposed Priorities

This RFC proposes a separation of associated activities into two sequential investigations:

Community engagement is invited and will help with both phases. Scale testing OSB’s client architecture will need to be thorough to ensure we are covering enough scope and feedback on this front will be helpful. More details for the first phase are explored in the following section.

Requirements

Investigating and improving OSB’s client architecture can be broken down into several steps:

  1. Identify current limits: The OSB community is aware that there are limitations in terms of scaling clients within OSB, but is unsure of what those exact limitations are. A majority of the time, OSB is used as a single load generation host to emulate the performance of a fleet of nodes. Therefore, a performance comparison between a cluster of nodes, each with OSB set to a single client, and a single node with OSB set to several clients will help uncover what those exact limitations are.
  2. Identify workarounds if possible: After understanding the limitations, we will determine if there are any quick workarounds that users can resort to to alleviate scaling limitations, while work progresses on long-term solutions.
  3. Investigate bottlenecks (or causes of limitations): For the limitations discovered in step 1, we will need to investigate the bottlenecks in more depth and identify causes.
  4. Overcome bottlenecks: Identify and implement appropriate solutions on how to resolve bottlenecks and remove limitations
  5. Publish info on discoveries and actions taken: After all the work has been done, we should summarize our findings and solutions and ensure that OSB has been appropriately updated to handle scaling better.

Subsequent issues will be created to address these requirements and elaborate on implementation details.

Use Cases

How Can You Help?

Next Steps

We will incorporate feedback and add more details on design, implementation and prototypes as they become available.

Appendix

Benchmarking Process Under the Hood

OSB uses a group of actors that are based on the thespian.actors from Thespianpy library, an actor model framework available in Python. These actors coordinate with one another and can be viewed as the components that make up OSB’s benchmarking process.

Each actor has its own responsibility. For example, the Benchmark Actor starts the overall benchmarking process and calls upon the Builder Actor to determine if there is a provisioned OpenSearch cluster. The Worker Coordinator Actor is called upon to prepare the benchmark by communicating with the Workload Preparation Actor. To supply load to the OpenSearch cluster, the Worker Coordinator Actor provisions a number of Worker Actors based on the number of CPU cores in the load generation host. Based on the number of clients (such as bulk_indexing_clients and search_clients) set in the workload or specified by the user, Worker Actors will be allocated a number of clients and steps (also known as tasks or operations in workloads) to execute. These workers will split up the work and each simulate N number of clients. Once the workers and their respective clients have finished executing a step, they will reconvene at a joinpoint before the Worker Coordinator Actor informs them to proceed to the next step.

gkamat commented 1 month ago

This is a timely RFC. Understanding how well OSB scales is essential to carrying out accurate and meaningful performance tests. A few comments:

IanHoang commented 1 month ago

I have updated the RFC to be more high-level and concise based on the feedback received.