Compare the results of two different test runs

dgzlopes commented 3 years ago

Is your feature request related to a problem? Please describe. There is no native way to compare the results of two different test runs.

I would love to know if my latest change is going to worse the performance of my service, or if it's going to break something.

Also, when I run k6 on CI, I would like to comment back on each PR with the diff between the PR and master, the same way I do with coverage tools.

Describe the solution you'd like I would like to have a compare command in k6. Nothing fancy! Just compare two JSON summaries, and beautifully print the diff between them.

Some things should be customizable, like the percentage change on which a metric is "important" (we don't want to pollute the output with 0.1% changes).

Another option (?), now that we have the handleSummary() callback is to add this at the script level. This way you can compare this run with a past run (that you can load via file).

Describe alternatives you've considered I wrote a small script that does something "similar".

# diff.py
# requirements: Python 3.8.3 and deepdiff==5.2.2
from deepdiff import DeepDiff
from json import load

TOP_PERCENT_THRESHOLD = 20
BOTTOM_PERCENT_THRESHOLD = -20
BASELINE_FILE = "baseline.json"
RUN_FILE = "run.json"

print(f"Comparing {BASELINE_FILE} with {RUN_FILE}")
print(f"Percent Threshold: from {TOP_PERCENT_THRESHOLD} to {BOTTOM_PERCENT_THRESHOLD}")
print("-" * 60)

with open(BASELINE_FILE) as baseline, open(RUN_FILE) as run:
    data_baseline = load(baseline)
    data_run = load(run)

    result_diff = DeepDiff(data_baseline, data_run, ignore_order=True)

    for _, metric_name in enumerate(result_diff["values_changed"]):
        new_value = result_diff["values_changed"][metric_name]["new_value"]
        old_value = result_diff["values_changed"][metric_name]["old_value"]
        percent_change = 100 * (new_value - old_value) / old_value
        if percent_change > TOP_PERCENT_THRESHOLD:
            print(f"{metric_name} -> {round(old_value, 6)} ++ {round(new_value, 6)}")
        if percent_change < BOTTOM_PERCENT_THRESHOLD:
            print(f"{metric_name} -> {round(old_value, 6)} -- {round(new_value, 6)}")

Results:

➜  python diff.py
Comparing baseline.json with run.json
Percent Threshold: from 20 to -20
------------------------------------------------------------
root['metrics']['http_req_receiving']['min'] -> 0.086011 -- 0.04834
root['metrics']['http_req_receiving']['p(95)'] -> 0.759464 ++ 0.917917
root['metrics']['http_req_blocked']['max'] -> 130.380219 ++ 162.900105
root['metrics']['http_req_blocked']['p(95)'] -> 121.592829 ++ 157.314519
root['metrics']['http_req_blocked']['p(90)'] -> 118.183842 ++ 152.276601
root['metrics']['http_req_blocked']['avg'] -> 13.628404 ++ 17.557079

na-- commented 3 years ago

I don't think a k6 compare subcommand should be a part of k6, sorry. While k6 aims to be a somewhat battery-included kind of tool, this is a few steps too far... As you've demonstrated, it can be easily done as an external script, and as you've also mentioned

now that we have the handleSummary() callback is to add this at the script level. This way you can compare this run with a past run (that you can load via file).

So what's the need of a new sub-command when you can just JSON.parse(open('previous-summary-export.json')) and then do any of the following things:

set thresholds based on the previous values that would fail the script
compare the current summary to the old one in handleSummary() and throw an exception if something
compare the current summary to the old one in handleSummary() and produce a pretty HTML diff

So, why should we add an inflexible sub-command to k6 when the desired behavior can already be achieved with some scripting and a lot greater flexibility? At best, this could be a blog/docs article with a few examples, and maybe a helper script or two in jslib.k6.io...

dgzlopes commented 3 years ago

After reading your reasoning, I agree with you @na--. This feature request is too opinionated and inflexible.

I was seeing it from the lenses of other CI tools, but here we've more flexibility.

Either way, I see some benefit in documenting this topic. It's something that the user can easily do on k6 cloud (with baselines), but there is no clear way on how to do it on k6 OSS.

na-- commented 3 years ago

This topic definitely deserves a few example scripts and maybe a blog post or two, so we should move this to the k6 docs repo when both are in the same organization again (docs were recently moved to https://github.com/k6io/docs, and k6 will be moved a bit later)

sniku commented 3 years ago

I'll chime in on the last point.

handleSummary should allow users to do semi-native performance regression tests. It's possible to implement the regression testing in a similar way I'm proposing below. Everything aside from the setting the exit code is currently supported.

export function handleSummary(data) {
   let baseline_performance_data = JSON.parse(open('baseline_performance_run.json')) // this file comes from the previous "good" run. 

   // compare `data` to `baseline_performance_data`
   let is_performance_within_range = compare_test_runs(data, baseline_performance_data); // function imported from jslib
   if(!is_performance_within_range){
      set_exit_code(99) // this API does not exist in k6 yet.
   }

    return {
        'stdout': textSummary(data, { indent: ' ', enableColors: true}),
        'this_run_data.json': JSON.stringify(data), 
    }
}

na-- commented 3 years ago

Moving this to jslib, now that the two repos are in the same organization...

na-- commented 2 years ago

I'm now moving this to https://github.com/grafana/k6-jslib-summary, since that's where we'll develop end-of-test summary helpers

mostafa commented 2 years ago

I expanded a tiny bit on @dgzlopes' idea that he wrote in Python, but I used vanilla JS. But I have a different idea I will work on as a k6 extension. https://github.com/mostafa/k6-test-diff

na-- commented 2 years ago

@mostafa, you don't need xk6-file to write files from handleSummary(), see https://k6.io/docs/results-visualization/end-of-test-summary#customize-with-handlesummary

mostafa commented 2 years ago

@na-- Does it mean that I can write as many files as I want? Or is it just one?

na-- commented 2 years ago

The result of handleSummry() is a JS object where the keys are file paths (or stdout / stderr, as special cases) and the values are file contents. You can return as many elements in that object as you want :man_shrugging:

mostafa commented 2 years ago

@na-- Then I definitely don't need the xk6-file extension. 😄

grafana / k6-jslib-summary

Compare the results of two different test runs #2