New Threshold specification

Motivation

There are several problems with the current Threshold implementation:

Thresholds in k6 are executed in JavaScript runtime while thresholds in k6 cloud are parsed and executed in Python. This doesn't work for non-trivial cases.
Thresholds can't be validated.
Threshold DSL format is limited in filtering. Example http_req_duration{status:200,status:201}
The UX of specifying non-trivial Thresholds is poor. Example group_duration{group:::publicCrocodiles}.

Quick overview of the new format

import { Threshold } from "k6";
import { http_req_duration } from "k6/metrics/http";
import { Counter } from "k6/metrics";

let successfulLogins = new Counter("successful_logins");

export let options = {
  thresholds: [
    new Threshold(http_req_duration, {
      filter: [
        'staticAssets == "true"',
        // "status >= 200", // future extension. Not supported by this proposal.
      ],
      conditions: ["mean < 500", "p(99) < 300"],
      abortOnFail: false,
      name: "My SLA for the CDN"
    }),
    new Threshold(successfulLogins, {
      conditions: ["count > 1"],
    }),

  ]
}

Improvements

The new format will bring the following benefits:

Consistency between local execution and cloud execution.
Ability to validate thresholds.
Developer-experience - the new format is easier to write and understand.
Ability to specify more than one threshold on the same metric (http_req_duration)
Ability to filter metric data in a more flexible way (create sub-metrics).
Performance. k6 won't need to calculate values for aggregation methods that are not used. (no need to calculate avg when threshold only uses min)

Drawbacks of the new format

Thresholds can no longer be a free-form javascript code.
More verbose format - it requires more code.

Implementation details

backwards compatibility

To provide backwards compatibility, the new k6 thresholds are defined in an array. The old thresholds are using an object {}.

Old:

export let options = {
  thresholds: {} // object
};

New

export let options = {
  thresholds: [] // array
};

The old format should continue to be supported, but it should raise a WARNING.

Parsing

To provide consistency between local and cloud execution, threshold parsing should happen only in k6. When the test is executed in the cloud, k6 should parse and provide the agreed-upon format to the k6-cloud backend.

The proposed format is

let thresholds = [{
  metric_name: "http_req_duration|others",
  metric_type: "trend|gauge|counter|rate",
  metric_value_contains: "time|default",  // isTime=true
  metric_initial_value: 0, // 
  filters: [
    {"left-hand": "status", "operator": ">", "right-hand": "200", "right-hand-type": "int"}, // not supported by this proposal.
    {"left-hand": "staticAssets", "operator": "==", "right-hand": "yes", "right-hand-type": "numeric"},
  ],
  conditions: [
    {"left-hand": "mean", "operator": "<", "right-hand": "200", "right-hand-type": "string"},
  ],
  name: "http_req_duration(status>200, staticAssets=yes)", // auto-generate when not provided. 
}];

Filtering

Metrics can be filtered by tags and attributes. In the previous definition of Thresholds, filtered metrics were called "sub-metrics". I suggest we start calling them "filtered metrics".

Filtering syntax is similar to the threshold syntax tagname operator value

Example:

new Threshold(http_req_duration, {
  filter: [
    "staticAssets === true",
    "status >= 200",
  ]});

Validation

Thresholds should be validated and abort the execution in case of validation errors.

`metric_name` validation

All default metrics should be importable from built-in modules. The proposed place is:

import { Rate, Counter, Trend, Gauge } from "k6/metrics"; // no change

import { http_req_duration,
  http_req_receiving,
  http_req_waiting,
  http_req_sending,
  http_req_tls_handshaking,
  http_req_connecting,
  http_req_blocked,
  http_reqs } from "k6/metrics/http";  // scoping in the "http" module. Future protocols will have their own modules.
import { group_duration } from "k6/metrics/group"; // this is a protocol-independent metric.

import { Threshold } from "k6"; // other suggestions are welcome

The reasoning behind using an import instead of a string is:

It's less magic. Using imports removes the "magic strings" DSL format, and therefore makes the definition closer to JavaScript.
It's harder to make spelling mistakes when using imports (especially if linting is working)
These metrics are real instances and should be inspectable. For example http_req_duration is in fact Trend("http_req_duration", true);

`metric_type` validation

The Metric object passed to Threshold, must be of type Rate, Trend, Gauge or Counter. Any other types should raise an appropriate exception and abort the test execution.

new Threshold(Metric, {}) // Metric must be Rate|Trend|Gauge|Counter.

`filters` validation

Example:

  filters: [
    {
    "left-hand": "staticAssets",   // follow 
    "operator": "==", // only `==` and `===` allowed. 
    "right-hand": "yes", 
    "right-hand-type": "string" // string only
    },
  ]

Spaces between left-hand, operator and right-hand are allowed but should be stripped away by the parser.

Filters should support only these operators:

==
=== // same as ==. It's supported to make it more js-like.

Specifying any other operator should raise an exception and abort the test. Note, other operators may be added in the future (see the tags section below).

left-hand should follow the ECMAScript variable-name restrictions - https://mathiasbynens.be/notes/javascript-identifiers or more strict restrictions (left for the implementer to decide).

right-hand-type can only be a string at the moment. Strings should be quoted by the user, but quotes should be stripped away by the parser.

`conditions` array validation

Each threshold can have multiple conditions. Conditions can't be combined. Each condition must be a separate string.

Invalid

new Threshold(http_req_duration, {
  conditions: ["p95 < 500 && p99 < 500"],
})

Valid

new Threshold(http_req_duration, {
  conditions: ["p95 < 500", "p99 < 500"],
})

"conditions" array has a similar restriction as "filters" and should be parsed in a similar way.

 [{
    "left-hand": "mean", // `aggregation method` name - restricted per metric type
    "operator": "<",    // see operators below
    "right-hand": "200", 
    "right-hand-type": 
    "numeric"
    }]

`aggregation method` validation

Invalid aggregation method should raise an exception and abort the test execution.

The aggregation methods available for Metric types are specified below.

Counter

count (number of times the .add("whatever") has been called )
sum (sum of numeric values passed to .add(5))
rate (count/s)

Gauge

last
min
max
value (undocumented alias for last)

Rate

rate

Trend

min
mean
avg
max
p(N)

Operators > >= < <= == === != !==

grafana / k6

[Archived] Thresholds overhaul - the ideal state of thresholds #1441

New Threshold specification

Motivation

Quick overview of the new format

Improvements

Drawbacks of the new format

Implementation details

backwards compatibility

Parsing

Filtering

Validation

`metric_name` validation

`metric_type` validation

`filters` validation

`conditions` array validation

`aggregation method` validation

tags

grafana / k6

[Archived] Thresholds overhaul - the ideal state of thresholds #1441

New Threshold specification

Motivation

Quick overview of the new format

Improvements

Drawbacks of the new format

Implementation details

backwards compatibility

Parsing

Filtering

Validation

metric_name validation

metric_type validation

filters validation

conditions array validation

aggregation method validation

tags

`metric_name` validation

`metric_type` validation

`filters` validation

`conditions` array validation

`aggregation method` validation