grafana / k6

A modern load testing tool, using Go and JavaScript - https://k6.io
GNU Affero General Public License v3.0
26.2k stars 1.27k forks source link

Remove the JS runtime from threshold calculations #1443

Closed sniku closed 2 years ago

sniku commented 4 years ago

This issue aims to bring 80% of the benefits of #1441 while doing only 20% of the work.

The ideal future state of thresholds is defined in #1441 . This issue addresses only the most important problem of thresholds - removing the js runtime. This issue doesn't introduce the new concepts from the original specification.

Motivation

This issue addresses the following issues (two first issues from the original spec).

  1. Thresholds in k6 are executed in JavaScript runtime while thresholds in k6 cloud are parsed and executed in Python. This doesn't work for non-trivial cases.
  2. Thresholds can't be validated.

Quick overview of the new thresholds

import { http_req_duration } from "k6/metrics/http";
import { Counter } from "k6/metrics";

let successfulLogins = new Counter("successful_logins");

export let options = {
  thresholds: {  // thresholds are still defined in an object {}
    'http_req_duration{staticAssets:true}': [
      {threshold: "mean < 500"}  // the threshold string is not JS. 
    ],
    'successful_logins': ["count > 1"]
  }
}

Yes, it's the old format, but the threshold expression is no longer executed in a JS runtime. Instead, it's parsed and executed in go.

Parsing

(copied from the original spec).

To provide consistency between local and cloud execution, threshold parsing should happen only in k6. When the test is executed in the cloud, k6 should parse and provide the agreed-upon format to the k6-cloud backend.

The proposed format is

let thresholds = [{
  metric_name: "http_req_duration|others",
  metric_type: "trend|gauge|counter|rate",
  metric_value_contains: "time|default",  // isTime=true
  metric_initial_value: 0, // 
  filters: [
    {"left-hand": "status", "operator": ">", "right-hand": "200", "right-hand-type": "int"}, // not supported by this proposal.
    {"left-hand": "staticAssets", "operator": "==", "right-hand": "yes", "right-hand-type": "numeric"},
  ],
  conditions: [
    {"left-hand": "mean", "operator": "<", "right-hand": "200", "right-hand-type": "string"},
  ],
  name: "http_req_duration(status>200, staticAssets=yes)", // auto-generate when not provided. 
}];

Validation

Thresholds should be validated and abort the execution in case of validation errors.

metric_name validation.

The metric name must be one of:

  1. Built-in k6 metric (http_req_duration, etc)
  2. user-defined metric (successful_logins, etc).

When k6 detects a threshold on a metric that's not defined, the test should be aborted. The init context must be executed to find user-defined metrics. Metric defined in branches are no longer supported

if (1==2){  // invalid
   let successfulLogins = new Counter("successful_logins");
}

filters validation

Example of filtered-threshold

export let options = {
  thresholds: {  
    'http_req_duration{staticAssets:true}': ["mean < 500"] 
  }
}

Should be parsed as:

  filters: [
    {
    "left-hand": "staticAssets",
    "operator": "==",  // only equality operator supported at the moment
    "right-hand": "true", 
    "right-hand-type": "string" // only string supported at the moment
    },
  ]

Spaces between left-hand, operator and right-hand are allowed but should be stripped away by the parser.

left-hand should follow the ECMAScript variable-name restrictions - https://mathiasbynens.be/notes/javascript-identifiers or more strict restrictions (left for the implementer to decide).

right-hand-type can only be a string at the moment.

conditions array validation

Each threshold can have multiple conditions. Conditions can't be combined. Each condition must be a separate string.

Invalid

export let options = {
  thresholds: {  
    'http_req_duration': ["p95 < 500 && p99 < 500"] 
  }
}

Valid

export let options = {
  thresholds: {  
    'http_req_duration': ["p95 < 500', 'p99 < 500"] 
  }
}

"conditions" array has a similar restriction as "filters" and should be parsed in a similar way.

 [{
    "left-hand": "mean", // `aggregation method` name - restricted per metric type
    "operator": "<",    // see operators below
    "right-hand": "200", 
    "right-hand-type": 
    "numeric"
    }]

aggregation method validation

Invalid aggregation method should raise an exception and abort the test execution.

The aggregation methods available for Metric types are specified below.

Counter

Gauge

Rate

Trend

Operators

na-- commented 4 years ago

Some duplicate/related issues that will have to be combed through and closed/unified with this one (or https://github.com/loadimpact/k6/issues/1441): https://github.com/loadimpact/k6/issues/961, https://github.com/loadimpact/k6/issues/1136, https://github.com/loadimpact/k6/issues/1312, https://github.com/loadimpact/k6/issues/1305, https://github.com/loadimpact/k6/issues/1313, https://github.com/loadimpact/k6/issues/1321, and https://github.com/loadimpact/k6/issues/680 (or https://github.com/loadimpact/k6/issues/870)

na-- commented 4 years ago

https://github.com/loadimpact/k6/issues/1526 demonstrates that we should probably emit a warning for thresholds that don't correspond to a system or custom metric

na-- commented 3 years ago

Not sure if this should be here or in https://github.com/loadimpact/k6/issues/1441, or even in https://github.com/loadimpact/k6/issues/1313, but in the next version of the threshold submetric definition syntax, we should likely ditch the colon for equality and use == instead. Besides us wanting other expressions (== makes more sense when we also want to have other comparisons like <, >, etc.), it will make filtering by group much better, see https://github.com/loadimpact/k6/issues/948#issuecomment-767410373

na-- commented 3 years ago

Another benefit for having the thresholds be defined as an array instead of a map, besides distinguishing between the old and new format for specifying them, is that when we have them ordered, we can more easily implement the custom exit code per threshold: https://github.com/loadimpact/k6/issues/680#issuecomment-767432081

oleiade commented 3 years ago

For the reference, and to support my own understanding, here's a little BNF definition of the proposed threshold expression format I tried to extract from the existing syntax, as well as the proposed one:

assertion           -> aggregation_method whitespace* operator whitespace* float
aggregation_method  -> trend | rate | gauge | counter
counter             -> "count" | "sum" | "rate"
gauge               -> "last" | "min" | "max" | "value"
rate                -> "rate"
trend               -> "min" | "mean" | "avg" | "max" | percentile
percentile          -> "p(" float ")"
operator            -> ">" | ">=" | "<=" | "<" | "==" | "===" | "!="
float               -> digit+ (. digit+)?
digit               -> "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
whitespace          -> space | tab
tab                 -> "\t"
space               -> " "

N.B I followed the BNF definition syntax presented in Crafting Interpreters, as I find it both more readable/understandable and easy to manipulate (considering we're not intending to produce an actual parser out of it).

oleiade commented 2 years ago

Here's a little update on the context and scope of this issue posted by @na-- in #2251 for future reference:

We don't know how the k6 thresholds of the future will look like... #1441 is one suggestion for one aspect of them, there are various improvement ideas in #1136 and some of its linked issues and in this comment (probably others as well), #1831 will almost certainly also affect how we do things...

So, we don't know how thresholds 2.0 will look like, but it's almost certainly going to be different than the current thresholds. Our current syntax is just too limited to support a lot of the things we want. Though, even after we have better thresholds and deprecate the current syntax, we probably won't ever stop supporting it... It's such a major feature that all old k6 scripts should continue to work. We'll just silently (or with a warning) support the old format, internally translating it to the new feature (since its capabilities will be a superset of the current ones), but we probably won't ever develop it or add new features to it.

Again, we don't know how parsing of the future thresholds will look like. We might go with our own custom parser, and then a fancy fully-featured one will be great. But we might decide to adopt PromQL or some other widely-used format for them and use a ready library for parsing it.

oleiade commented 2 years ago

We had initially planned for the refactor to ship with v0.36.0. However, we uncovered certain issues with it internally, which pushed us to momentarily revert the change and delay its merge into master. We now plan to ship this, with #2330 by v0.37.0.

oleiade commented 2 years ago

The reframed scope as discussed in the comments is implemented as of #2400. The initial scope is still something we'd like to achieve as a longer-term goal. Because the issue is already so old, and there have been numerous iterations on this, we prefer to close this issue in favor of a new one as soon as the initial scope gets prioritized again.