Emit an errors metric and have a default error rate script abort threshold

na-- commented 5 years ago

It would be very useful if we resurrect the currently unused errors metric and also add a default threshold that would abort the script if the error rate exceeds a certain configurable value.

This issue is a summary of an idea that has been discussed in various other issues and PRs before:

na-- commented 5 years ago

All exceptions during an iteration will add to the errors metric, but we probably should offer users more flexibility by extending the fail() function to support custom error tags. Something like this: fail("error message", { tags: {action: "customer-add", cause: "cannot-login" } }.

And ideally, we should improve the UX of fail() errors. Currently, running this script:

import { fail } from "k6";

export default function () {
   fail("test");
}

will produce and error like this:

ERRO[0000] GoError: test
    at native
    at script.js:4:18(4)

whereas something like Fail "test" at script.js:4:18(4) or script error ... or user error would be much better - normal users don't need to know k6 is written in Go... If there's an use case, we might even add a way to specify extra data in fail() besides the metric tags and error message, though I'm not sure what a good use case would be for that.

lukecusolito commented 4 years ago

@na-- the referenced issue #735 was closed in favor of this one. Will this issue cover the following?

When a check fails or fail() is called, a failure exit code is returned. This would enable us to fail a build pipeline in the event the check criteria is not met without setting a threshold.

na-- commented 4 years ago

@lukecusolito, sorry for the late response, I somehow missed your question until I just now looked at this issue for another reason :disappointed:

To answer your question, a single check or a fail() likely won't abort the whole script. Rather, if a certain percentage of iterations are aborted with an error (either fail() or a thrown JS error), k6 would abort the whole script.

As shown in the script example from https://github.com/loadimpact/k6/issues/735#issuecomment-412801060, you can sort of implement this even now, with a few lines of code, using custom metrics and thresholds. And using the same approach of custom metrics and thresholds, you can also implement a script abort if only a single check or iteration fails, if that's what your use case is.

This issue is focused on actually emitting these iteration errors as a metric, and having some sensible default threshold that automatically abort the test if too many iterations result in an error. After all, if 50% of your iterations are interrupted by an error, your load test probably isn't very meaningful...

na-- commented 4 years ago

Thinking a bit more about this, and considering that #1007 added a dropped_iterations metric (https://github.com/loadimpact/k6/pull/1529, details), maybe this metric shouldn't be called errors, but something like interrupted_iterations metric instead? And have a cause tag? And be used for 4 things:

normal JS exceptions, having a cause: error tag
calls from fail(), with cause: fail or cause: error (not sure, depends on how we want to do the threshold), and potentially other extra tags, as described above
and iterations for which goja's execution was interrupted by the executor because the gracefulStop or gracefulRampDown were insufficient for the iteration to finish completely, with cause: interrupted or something like that.
when someone interrupts the whole test run with Ctrl+C or something like that, any currently running iterations should also probably emit interrupted_iterations{cause: interrupted}: 1? Though that's the least important case probably...

Another reason for that is that it seems good UX for us to update the UI counter for interrupted iterations (i.e. the ZZ in the running (MMmSS.Ss), 0/XX VUs, YY complete and ZZ interrupted iterations text above the progress bars) to also include iterations that were prematurely interrupted because of an exception or fail()?

Finally, regarding thresholds... I'm not sure if we should add a default one, considering there might not be a way to users to overwrite it or remove it currently? This needs to be checked, but regardless, if we even want the possibility of adding a sensible threshold based on interrupted_iterations{cause: error} (that's why I want to classify both exceptions and fail() calls as cause: error), we need to make the new metric a Rate instead of a Counter.

Unfortunately, that means k6 will emit more noise for each iteration (both iterations: 1 and interrupted_iterations: 0)... Which is far from ideal, but unfortunately we currently can't have a threshold referencing 2 metrics (i.e. something like interrupted_iterations / iterations < 0.05 is impossible :disappointed: ).

So, I'm not sure if we should have a Rate or a Counter. Rate will allow us better thresholds now (e.g. interrupted_iterations{cause:error}: ["rate < 0.05"]), and its downsides can be somewhat ameliorated by https://github.com/loadimpact/k6/issues/1321 when performance matters. Counter will be lighter for metrics processing, but won't allow us to set a sensible threshold until the thresholds are overhauled (https://github.com/loadimpact/k6/issues/1441, https://github.com/loadimpact/k6/issues/1443 and connected issues)... @sniku @MStoykov @imiric, what do you think?

mstoykov commented 4 years ago

I like the idea of being able to differentiate between the different reasons k6 has interrupted an iteration.

I did want to see if we can just add it on top of the current Iterations metric ... but

We don't emit it if the context is Done which is basically all the graceful* variants
It will become harder to use Which probably means that this is not a good idea :(

I also think that Rate is a better fit at least given the current shortcomings of the threshold api ...

na-- commented 4 years ago

We don't emit it if the context is Done which is basically all the graceful* variants

Even though it will be a minor breaking change, since we don't emit iterations: 1 for interrupted-by-executor iterations currently, there's something to be said for also not emitting it for interrupted-by-error iterations as well, especially if we have a new interrupted_iterations metric that will take it into account.

na-- commented 4 years ago

Somewhat connected issue that can probably be implemented at the same time as this one: https://github.com/loadimpact/k6/issues/1250

imiric commented 3 years ago

Hey, I started exploring this issue, but it's a bit difficult to gather precise requirements because of the amount of related issues and discussions. So far I'm mostly working based on this comment.

One question so far: #1007 also introduced ExecutionState.interruptedIterationsCount:

https://github.com/loadimpact/k6/blob/420dd4161c280bdf0398af5e58f73c939057695a/lib/execution.go#L218-L223

This is not a metric, but an internal counter incremented by specific executors, and is shown in the test run UI (X complete and Y interrupted iterations).

At first glance it seems that the proposed interrupted_iterations metric should serve the same purpose, so questions about reconciling this:

Right now it's simple enough for fail() to emit an interrupted_iterations metric, but it seems impossible for the js/modules/k6 package to access the ExecutionState, we can only get the lib.State from the context. lib.State and lib.ExecutionState have some conceptual overlap (they both have lib.Options for example), so could we consider a merger of these into a single struct?
If there's an interrupted_iterations metric, can we remove the internal interruptedIterationsCount and have executors emit this metric instead? Also, removing it from the UI as well, so I guess only showing X complete iterations? This should be simpler than a refactor to merge both states or expose ExecutionState elsewhere, but it won't be shown in real-time anymore and will only appear in the end-of-test summary.

na-- commented 3 years ago

At first glance it seems that the proposed interrupted_iterations metric should serve the same purpose

This is somewhat true, but the devil, as always, is in the details... The ExecutionState.interruptedIterationsCount atomic is only meant for the live progressbar UI, to give a better idea of what the test run is doing in real time. The proposed interrupted_iterations aims to expose that same information to the metrics subsystem, as discrete events every time an iteration is interrupted, so the data is available in the thresholds, end-of-test summary, external outputs, etc.

It'd be difficult to drop ExecutionState.interruptedIterationsCount even after the interrupted_iterations metric is implemented, since we'd have to add something listens for interrupted_iterations data points in real time and displays them in the UI... It's certainly possible to do, but it will be much more inefficient than the current solution and a bit of a spaghettification of the code. It's similar to the progressbar counters for a shared-iterations executor and the iterations metric - we both emit an iterations data point, and we increment a counter, since those are two conceptually different systems.

And while currently throw Error() and fail() don't count towards interruptedIterationsCount, they should. And connected to that,

Right now it's simple enough for fail() to emit an interrupted_iterations metric, but it seems impossible for the js/modules/k6 package to access the ExecutionState, we can only get the lib.State from the context

The easiest way to detect failed iterations would be in getIterationRunner(). And you should be able to "throw" a custom struct error type from fail(), in order to distinguish it from normal JS exceptions, and to also attach extra metric tags in fail(). Then you should be able to type assert for it in getIterationRunner().

lib.State and lib.ExecutionState have some conceptual overlap (they both have lib.Options for example), so could we consider a merger of these into a single struct?

They don't really overlap all that much. A better name for lib.State would probably have been lib.VUState, since each VU has its own instance of it, whereas there is only one lib.ExecutionState and it's used by the local execution scheduler. So I don't see a way we can merge them. lib.State is definitely overdue for a big refactoring, so it's more usable by extensions and even k6 code modules, but that's another topic for the future...

2. If there's an interrupted_iterations metric, can we remove the internal interruptedIterationsCount and have executors emit this metric instead? Also, removing it from the UI as well, so I guess only showing X complete iterations? This should be simpler than a refactor to merge both states or expose ExecutionState elsewhere, but it won't be shown in real-time anymore and will only appear in the end-of-test summary.

As I mentioned above, unless I am very much mistaken, you should be able to update both the atomic and emit the metric from a single place across all executors, the getIterationRunner() helper. So, given how small the upkeep for having the extra real-time information is, I don't see a need to remove it...

na-- commented 2 years ago

For future reference, https://github.com/grafana/k6/pull/1769 was an old attempt to implement this that was blocked by https://github.com/grafana/k6/issues/1321 and became stale.

grafana / k6

Emit an errors metric and have a default error rate script abort threshold #877