Open OVI3D0 opened 2 months ago
This will be a great addition to OpenSearch Benchmark as it addresses several pain points that several users have had for years. It will also diversify OSB's capabilities and open up new development opportunities.
To add to the second proposed priority, when validating if the comparison can be performed, the compare feature should also determine if the two ids' test procedures (or scenarios) are different. Some things to also consider:
Comparison between <baseline-id> and <contender-id> have same workloads but different test procedures.
Would you still like OSB to compare? [y/n]:
Overall, great RFC and am excited to see what comes out of this!
Synopsis
OpenSearch Benchmark (OSB) is a performance testing tool for OpenSearch, a community-driven, open source search and analytics suite. It allows users to benchmark various aspects of OpenSearch, such as indexing, querying, and more, under different configurations and workloads. The Compare API is a feature in OSB that allows users to analyze and compare the performance differences between two benchmark test executions. While valuable, the current implementation has certain limitations. This RFC proposes enhancements to the Compare API which will improve how OSB analyzes and presents benchmark results, making OSB a more versatile tool for users in the OpenSearch community.
Motivation
Upon executing a test, OSB assigns a unique ID to each test execution result. The current implementation of the Compare API in OSB allows users to compare and analyze the results of two benchmark test executions by providing the UID of a test execution to be used as a baseline, as well as the UID of a contender which is compared to the baseline. Users can obtain these test execution IDs using the opensearch-benchmark list test-executions command.
The following is an example of how the compare API is invoked and its respective output.
The comparison output shows metrics and percent difference between the tests. This is particularly useful when evaluating the performance differences across test runs and OpenSearch versions and configurations. The Compare API comes with additional command-line options, such as including specific percentiles in the comparison, exporting the comparison to different output formats, and appending the comparison in the results file.
However, the Compare API has limitations.
In performance testing, it is common practice to run the same test multiple times to account for any variability and ensure more consistent results. This variability can arise from various factors in the environment, as well as random fluctuations in the test environment. By aggregating the results, users can obtain a more reliable and representative measure of performance, reducing the impact of outliers or random variations.
Requirements
To address the limitations of the compare API and to enhance the overall data processing experience in OSB, the following capabilities should be added.
Proposed Solutions:
For example, if we have three test executions with the following median indexing throughput values and iteration counts:
The weighted average for median indexing throughput would be calculated as such:
Example usage:
execute
command to support running multiple iterations of a test and automatically aggregating the results. New flags include:--test-iterations
: Specify the number of test iterations--aggregate
: Control result aggregation--sleep-timer
: Set a sleep timer between iterations--cancel-on-error
: Choose to cancel on errorSubsequent issues will be created to address these requirements further and elaborate on implementation details.
Stakeholders
Use Cases
How Can You Help?
Open Questions
Next Steps
We will incorporate feedback and add more details on design, implementation and prototypes as they become available.