Closed finnroblin closed 3 weeks ago
For instance, a user might specify "clients_list": [1, 5] in their parameters. Then OSB will schedule search tasks with 1, 5, 10, and 12 clients.
Why would the OSB schedule search tasks with 10 and 12 clients if the parameters only specified 1 and 5?
Description
Finding the maximum search throughput for an OpenSearch cluster is an important benchmarking scenario in vector search. Currently the only way to figure out the maximum search throughput is to perform multiple benchmark runs with different search client settings (e.g. search_clients = 3, search_clients = 5, ...). Waiting for a run to conclude, changing the config, and rerunning OSB with the associated startup time is tedious. Ideally the maximum throughput could be found automatically.
This PR allows users to provide a
clients_list
in a operation which runs the operation with each client setting.For instance, a user might specify
"clients_list": [1, 5]
in their parameters. Then OSB will schedule search tasks with 1, 5, 10, and 12 clients. The final benchmark results will look something like the following:Issues Resolved
Closes #613
Testing
Unit tested
loader.py
changes + verified with multiple OSB runs that result publisher output looks good.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.