grafana / k6

A modern load testing tool, using Go and JavaScript - https://k6.io
GNU Affero General Public License v3.0
25.04k stars 1.24k forks source link

[Archived] Thresholds overhaul - the ideal state of thresholds #1441

Open sniku opened 4 years ago

sniku commented 4 years ago

New Threshold specification

Motivation

There are several problems with the current Threshold implementation:

  1. Thresholds in k6 are executed in JavaScript runtime while thresholds in k6 cloud are parsed and executed in Python. This doesn't work for non-trivial cases.
  2. Thresholds can't be validated.
  3. Threshold DSL format is limited in filtering. Example http_req_duration{status:200,status:201}
  4. The UX of specifying non-trivial Thresholds is poor. Example group_duration{group:::publicCrocodiles}.

Quick overview of the new format

import { Threshold } from "k6";
import { http_req_duration } from "k6/metrics/http";
import { Counter } from "k6/metrics";

let successfulLogins = new Counter("successful_logins");

export let options = {
  thresholds: [
    new Threshold(http_req_duration, {
      filter: [
        'staticAssets == "true"',
        // "status >= 200", // future extension. Not supported by this proposal.
      ],
      conditions: ["mean < 500", "p(99) < 300"],
      abortOnFail: false,
      name: "My SLA for the CDN"
    }),
    new Threshold(successfulLogins, {
      conditions: ["count > 1"],
    }),

  ]
}

Improvements

The new format will bring the following benefits:

  1. Consistency between local execution and cloud execution.
  2. Ability to validate thresholds.
  3. Developer-experience - the new format is easier to write and understand.
  4. Ability to specify more than one threshold on the same metric (http_req_duration)
  5. Ability to filter metric data in a more flexible way (create sub-metrics).
  6. Performance. k6 won't need to calculate values for aggregation methods that are not used. (no need to calculate avg when threshold only uses min)

Drawbacks of the new format

  1. Thresholds can no longer be a free-form javascript code.
  2. More verbose format - it requires more code.

Implementation details

backwards compatibility

To provide backwards compatibility, the new k6 thresholds are defined in an array. The old thresholds are using an object {}.

Old:

export let options = {
  thresholds: {} // object
};

New

export let options = {
  thresholds: [] // array
};

The old format should continue to be supported, but it should raise a WARNING.

Parsing

To provide consistency between local and cloud execution, threshold parsing should happen only in k6. When the test is executed in the cloud, k6 should parse and provide the agreed-upon format to the k6-cloud backend.

The proposed format is

let thresholds = [{
  metric_name: "http_req_duration|others",
  metric_type: "trend|gauge|counter|rate",
  metric_value_contains: "time|default",  // isTime=true
  metric_initial_value: 0, // 
  filters: [
    {"left-hand": "status", "operator": ">", "right-hand": "200", "right-hand-type": "int"}, // not supported by this proposal.
    {"left-hand": "staticAssets", "operator": "==", "right-hand": "yes", "right-hand-type": "numeric"},
  ],
  conditions: [
    {"left-hand": "mean", "operator": "<", "right-hand": "200", "right-hand-type": "string"},
  ],
  name: "http_req_duration(status>200, staticAssets=yes)", // auto-generate when not provided. 
}];

Filtering

Metrics can be filtered by tags and attributes. In the previous definition of Thresholds, filtered metrics were called "sub-metrics". I suggest we start calling them "filtered metrics".

Filtering syntax is similar to the threshold syntax tagname operator value

Example:

new Threshold(http_req_duration, {
  filter: [
    "staticAssets === true",
    "status >= 200",
  ]});

Validation

Thresholds should be validated and abort the execution in case of validation errors.

metric_name validation

All default metrics should be importable from built-in modules. The proposed place is:

import { Rate, Counter, Trend, Gauge } from "k6/metrics"; // no change

import { http_req_duration,
  http_req_receiving,
  http_req_waiting,
  http_req_sending,
  http_req_tls_handshaking,
  http_req_connecting,
  http_req_blocked,
  http_reqs } from "k6/metrics/http";  // scoping in the "http" module. Future protocols will have their own modules.
import { group_duration } from "k6/metrics/group"; // this is a protocol-independent metric.

import { Threshold } from "k6"; // other suggestions are welcome

The reasoning behind using an import instead of a string is:

  1. It's less magic. Using imports removes the "magic strings" DSL format, and therefore makes the definition closer to JavaScript.
  2. It's harder to make spelling mistakes when using imports (especially if linting is working)
  3. These metrics are real instances and should be inspectable. For example http_req_duration is in fact Trend("http_req_duration", true);

metric_type validation

The Metric object passed to Threshold, must be of type Rate, Trend, Gauge or Counter. Any other types should raise an appropriate exception and abort the test execution.

new Threshold(Metric, {}) // Metric must be Rate|Trend|Gauge|Counter.

filters validation

Example:

  filters: [
    {
    "left-hand": "staticAssets",   // follow 
    "operator": "==", // only `==` and `===` allowed. 
    "right-hand": "yes", 
    "right-hand-type": "string" // string only
    },
  ]

Spaces between left-hand, operator and right-hand are allowed but should be stripped away by the parser.

Filters should support only these operators:

Specifying any other operator should raise an exception and abort the test. Note, other operators may be added in the future (see the tags section below).

left-hand should follow the ECMAScript variable-name restrictions - https://mathiasbynens.be/notes/javascript-identifiers or more strict restrictions (left for the implementer to decide).

right-hand-type can only be a string at the moment. Strings should be quoted by the user, but quotes should be stripped away by the parser.

conditions array validation

Each threshold can have multiple conditions. Conditions can't be combined. Each condition must be a separate string.

Invalid

new Threshold(http_req_duration, {
  conditions: ["p95 < 500 && p99 < 500"],
})

Valid

new Threshold(http_req_duration, {
  conditions: ["p95 < 500", "p99 < 500"],
})

"conditions" array has a similar restriction as "filters" and should be parsed in a similar way.

 [{
    "left-hand": "mean", // `aggregation method` name - restricted per metric type
    "operator": "<",    // see operators below
    "right-hand": "200", 
    "right-hand-type": 
    "numeric"
    }]

aggregation method validation

Invalid aggregation method should raise an exception and abort the test execution.

The aggregation methods available for Metric types are specified below.

Counter

Gauge

Rate

Trend

Operators > >= < <= == === != !==

tags

Currently, tags support only "strings" as a value type. In the future, int and float may be allowed to facilitate more fine-grained filtering of Thresholds.

Example:

export let options = {
  thresholds: [
    new Threshold(http_req_duration, {
      filter: [
       "status >= 200", // future extension. Not supported by this proposal.
       "status < 400", // future extension. Not supported by this proposal.
      ],
      conditions: ["avg < 500"],
      name: "Response time of valid requests should be below .5s on average"
    }),
  ]
}

Note: This proposal does NOT cover this use case.

sniku commented 4 years ago

This is a good spec, but it's dangerously large.

I'll make a new issue with limited scope that addresses 80% of the problems with 20% of the code.