Extend Compare Subcommand Capabilities

IanHoang commented 4 months ago

Is your feature request related to a problem? Please describe. At the moment, compare subcommand compares two different test executions and displays the differences to the user. We can extend this feature to support aggregating the results across a series of tests and even converting them into a CSV format.

Describe the solution you'd like We could utilize scripts similar to what's seen in these two scripts

OVI3D0 commented 1 month ago

Hey @IanHoang I can take this issue

IanHoang commented 1 month ago

Some folks used the scripts I linked and requested for the scripts’ features to eventually be incorporated into OSB. At the time, it seemed like the compare sub-command was the most appropriate place for these features because we were using those scripts to aggregate results across several test records and then compare them with other aggregated results. But on second thought, doing this might couple “aggregating” to “comparing”. We might be better off creating a new subcommand called aggregate. This would isolate the “aggregate” and “compare” abilities, and also make the tool more flexible.

For example, if users are ultimately comparing OS 2.3 with OS 2.4, they might be running several rounds of tests for each version. After running all the rounds of tests, they might do the following:

# Aggregate all rounds of tests related to OS 2.3
opensearch-benchmark aggregate --ids=id1,id2,id3 --output-name=os23 # Outputs to a new id called aggregate-os23
# Aggregate all rounds of tests related to OS 2.4
opensearch-benchmark aggregate --ids=id4,id5,id6 --output-name=os24 # Outputs to a new id called aggregate-os24

# Compares the aggregated results
opensearch-benchmark compare --baseline=aggregate-os23 --contender=aggregate-os24

gkamat commented 1 month ago

The idea for an aggregate command is a good idea. There are several scenarios where something like this would be useful, such as carrying out runs using the same release over several days, or when combining results from a set of loadgen hosts, similar to DWG. It will be helpful to gather ideas to incorporate into this command as folks comment on this issue.

opensearch-project / opensearch-benchmark

Extend Compare Subcommand Capabilities #532