grafana / k6

A modern load testing tool, using Go and JavaScript - https://k6.io
GNU Affero General Public License v3.0
24.93k stars 1.23k forks source link

External Threshold #3198

Open mstoykov opened 1 year ago

mstoykov commented 1 year ago

Preface:

This is a more of a discussion type of issue and as a reference I can refer users and contributors.

I also didn't link every issue or PR related to this, but will probably do over time.

For example https://github.com/grafana/k6/pull/2816 implements distributed execution and working thresholds. But this is only parts of the things discussed below.

What:

External threshold is the ability of k6 to have thresholds that are evaluated outside of (external to) a k6 instance or even a k6 distributed system.

Think PromQL compatible queries, InfluxDB queries, just normal HTTP request to something that tells you the status on a threshold.

For example whether the moon is in the correct/wrong phase if our test is affect by that.

Why:

Outside data:

A common occurrence is that people want to abort a test based not on things k6 can observer, but things observed by a different system.

Examples:

Expressiveness:

k6 thresholds are not exactly expressive/powerful compared to most other systems dealing with metrics.

Issues:

Without distributed execution baked into k6 (and even with one), one of the main currently not working parts is thresholds.

The primary reason for that is that we have to have the metrics to evaluate thresholds in one place. Which gets harder once you have many instances ... except that a user likely wants all that metrics, so they are putting the somewhere else to visualize later.

In a lot of the cases that will be a time series database of some kind and those have APIs to query them. After all that is more or less the whole point of collecting metrics.

Additionally, it uses a syntax and the features set that a user is (likely) familiar with. If a user is using Prometheus and cortex (for example) they likely have good understanding of PromQL. Making them learn another way to define thresholds instead of letting them just query it directly seems like bad UX. And we can probably integrate PromQL, but then users of other systems will need to learn it instead of something else. And as history has shown - the most beloved system and query language changes over time.

Additionally, k6 will need to grow all the features needed to do all kind of queries and then the optimizations to make them work. And then to teach them to users instead of just them using what they already know with their system. A system that they likely still need to actually make work in order to save and graph all the metrics that they will not have just thresholds on.

Caveats:

This definitely isn't feature targeting small users - especially the ones that want to not have any kind of outputs and just run k6 locally and look at the summary.

The current thresholds will still need to be supported probably forever even if this takes of, and we get really good integrations/libraries.

This also still doesn't remove the need for at least some improvements on the internal thresholds' functionality.

Alternatives:

Arguably all the above needs is some way to execute requests and abort the test.

And k6 is all about making requests, and there is an API to abort the test.

The only problem is that it won't tell the test that it failed due to a threshold. Unless we make a metric and a threshold on it and then use that.

If we had a way to disable metrics emission will also not "pollute" the metrics. . (Doing it on a per scenario or VU basis might be easier and useful in other cases, and make the API a bit easier to think about.)

Example

The example below uses the currently in k6 repo docker-compose setup, which uses Grafana + InfluxDB to store and visualize the metrics.

Running it with k6 run -o influxdb script.js will run it with outputting metrics to the InfluxDB.

In this case for the simplicity of the setup I get back the metrics that k6 outputs from InfluxDB, through the Grafana API. I could directly do this with InfluxDB I guess. But I also wanted to show you can go through Grafana. So for example if you needed to query multiple backends you can do it in one go through Grafana.

This obviously skips authentication as I didn't want to do that and has a lot of hardcoded values. Also while I could've aborted the script I decided to use the internal threshold and use that to fail.

import http from "k6/http";
import { Trend } from "k6/metrics";

let t = new Trend("my_threshold_value");

export const options = {
    scenarios: {
        main: {
            executor: "constant-arrival-rate",
            preallocatedVUs: 100,
            rate: 50,
            exec: "default",
            duration: "10m",
        },
        threshold: {
            executor: "constant-arrival-rate",
            preallocatedVUs: 5,
            rate: 1,
            timeUnit: "5s",
            exec: "thresholdChecking",
            duration: "10m",
            // preferably some way to disable this
        },
    },
    thresholds: {
        my_threshold_value: ["p(95) < 250"],
    },
};

export default () => {
    http.get("https://test.k6.io");
};

export async function thresholdChecking() {
    const value = await getThresholdValue();
    if (value == undefined || value == NaN) {
        return; // we just wait for data
    }
    // Here we "readd" this back in k6. But if for example we wwanted we could check here that the value for the last 1 minute for example is above something.
    t.add(value);
}

export async function getThresholdValue() {
    // This query is domain specific.
    // If the user is using something different it will be in a different format.
    // But also the user already likely uses this system and has good understanding of the syntax as they already use it.
    const query =
        'SELECT max("value") FROM "http_req_duration" WHERE time >= now() - 1m and time <= now() GROUP BY time(1m)';
    // Some of the other arguments here are ... abit magical and likely should be configurable.
    // grafana (and other systems that can proxy or store metrics directly) have REST APIs that can be further use d
    // https://grafana.com/docs/grafana/latest/developers/http_api/
    // Likely all of this should be abstracted in a library.
    const url = `http://localhost:3000/api/datasources/proxy/1/query?db=k6&q=${encodeURI(
        query,
    )}&epoch=ms`;

    const req = await http.asyncRequest("GET", url, null, {
        tags: { name: "threholds query" },
    });

    if (req.status == 200) {
        const val = req.json("results.0.series.0.values.0.1");
        if (val != null) {
            console.log(req.json("results"));
            return parseFloat(val);
        }
    }
    return undefined;
}

Additional notes:

The above example will look way better with some more high level API on top of the things below.

import thresholds from "somewhere";

thresholds.importAsScenarioFromGrafana(grafanURl,
    thresholds.influxdbQuery('SELECT max("value") FROM "http_req_duration" WHERE time >= now() - 1m and time <= now() GROUP BY time(1m)')
    {
        datasource: {type: "influxdb", name: "some easier id to be queried"},
        name: "nameOfTheThreshold",
        trigger: (val) => { return val%2 ? true: false},
    }
)

Also, any k6 API for work with thresholds will help:

na-- commented 1 year ago

External Thresholds

Agreed, something like this probably makes sense in the long term, unless you want to embed a fully-featured metrics DB like Prometheus inside of k6 :sweat_smile: Though, given the plethora of time series and metrics databases out there, maybe this can be handled as some sort of an xk6 extension? :thinking: Maybe even an optional feature that some of the already existing built-in outputs and output extensions can expose by implementing an optional interface? After all, k6 will be sending the test run metrics through these outputs to some remote DB, they can also be used to query that same DB back, right? They'd already be configured and authenticated, after all.

Though, even that might not be necessary in the short- and mid-term... :thinking: As you've shown with your example, you can already sort of do this with existing k6 functionality. So, with a few JS wrappers and helpers, as well as minor improvements to existing k6 features (e.g. custom exit codes for test.abort() in k6/execution or something like that), I expect that a big chunk of use cases that require complex thresholds can be satisfied with a pretty good UX...

Another note that I'd bring up here is that thresholds and the end-of-test summary might need to be considered together. They are pretty closely tied together in the current k6 implementation, after all. And while that's not a good reason to continue doing so, it results in some pretty good UX and it might even be easier to consider them together when it comes to implementing this feature too.

Also, it'd be tricky how this is handled in the cloud, which doesn't currently support external outputs. That probably needs to be supported before this feature.

Distributed Execution

In any case, I started replying here to explain why I chose to add support for metrics (both for the end-of-test summary and for thresholds) in my distributed execution PoC. It was simply because it makes for the best user experience to have the same consistent k6 behavior in both local and distributed tests, as much as that is practically possible. And, in this case, it was fairly easy to add support for metrics, so I did... :sweat_smile:

I will probably publish the full design doc on Monday, but over the last few days I worked hard on splitting https://github.com/grafana/k6/pull/2816 into even smaller self-sufficient and atomic commits. https://github.com/grafana/k6/pull/2816#issuecomment-1635466149 contains the details, but now the actual distributed execution (including setup() and teardown()) is a separate commit from adding support for metrics during it.

And the Support thresholds and the end-of-test summary in distributed execution commit (the actual link might be stale due to future rebases, look up the latest one) is relatively tiny, just 11 files changed, 609 insertions(+), 42 deletions(-). Actually it's more like half of that, given that most of the commit is Go-generated code from protobuf definitions... Admittedly, it is also without tests and with a very poorly performing implementation (e.g. no HDR histograms), but it's still indicative that the actual complexity of supporting metrics in a distributed test run is not very high.

Moreover, the way it's implemented should allow any future improvements to the built-in k6 thresholds to seamlessly apply to distributed test runs too, without any extra complexity! :tada: :crossed_fingers: That's because we are literally reusing most of the same code that powers thresholds and the end-of-test summary for local single-instance k6 run tests.

And, as you say, it's unlikely that the current thresholds will disappear. They might even see improvements even if external thresholds are also adopted. So it makes sense to me to support them in both local and distributed test runs, given the low effort required.

soggycactus commented 8 months ago

+1 for this feature request - it would be great to have the ability to query a Prometheus formatted metrics endpoint for threshold evaluation

tsloughter commented 4 months ago

I was looking for this feature and got pointed here.

I thought I'd mention https://kube-burner.github.io/kube-burner/latest/observability/alerting/ as the sort of thing I was hoping to find in k6.