grafana / k6

A modern load testing tool, using Go and JavaScript - https://k6.io

GNU Affero General Public License v3.0

25.94k stars 1.26k forks source link

Proposal for adding a new standard `http_req_failures` metric #1828

Closed sniku closed 3 years ago

sniku commented 3 years ago

I'm opening a new issue instead of building on top of https://github.com/loadimpact/k6/issues/1311 to start the discussion with a clean slate. I believe this proposal addresses most if not all issues brought up in the original proposal.

Before discussing the implementation details and effort required to build this feature, let's discuss why we want this feature at all.

Users want to know if their http requests fail or not, without adding boilerplate code.
Users want to see the http timing metrics for successful requests.
Users want to see the ratio between successful and failed requests (0.1% failures is often acceptable)
Some users may want to know the absolute number of failed requests (15 requests failed)

Basic requirements

This basic script must:

show the number of failed requests,
show response time for successful requests,
test must exit with non-0 exit code because the threshold is crossed.

import { sleep } from 'k6'
import http from 'k6/http'

export let options = {
  thresholds: {
    // test fails if more than 10% of HTTP requests fail. 
    // default http_success_hook is used to determine successful status codes.
    http_reqs_failure: ['rate < 0.1'],
  }
};

export default function() {
 let response_success  = http.get("https://httpbin.test.k6.io/status/200");
 let response_failure1 = http.get("https://httpbin.test.k6.io/status/503");
 let response_failure2 = http.get("https://httpbin.test.k6.io/status/503");
}

Discussion about the metric type for `http_reqs_failure`

There are two possible metric types for the new http_reqs_failure. Rate or Counter. Both types have their advantages, and it's not entirely clear which one is better for this use case.

Advantages of Rate:

ability to easily configure thresholds http_reqs_failure: rate<0.1
shows rate...

Advantages of Counter:

shows failures per second
shows the total number of failed requests (might be useful to some users)
is consistent with http_reqs metric.

The end-of-test summary for this test should look similar to this:

Output when using Counter metric

http_reqs..................: 3      3.076683/s
http_reqs_failure..........: 2      2.036683/s
http_reqs_success..........: 1      1.036683/s

Output when using Rate metric

http_reqs..................: 3      3.134432/s
http_reqs_failure..........: 33.33% ✓ 1 ✗ 2
http_reqs_success..........: 66.66% ✓ 2 ✗ 1

Neither Rate nor Counter covers all possible use-cases. I think Rate is preferable over Counter.

If we added count to the Rate metric, the output could possibly look similar to this

http_reqs..................: 3      0.136/s    
http_reqs_failure..........: 33.33% ✓ 2 (66.66%) 3.136/s   ✗ 1 (33.33%) 0.136/s    
http_reqs_success..........: 66.66% ✓ 1 (33.33%) 3.136/s   ✗ 2 (66.66%) 0.136/s

Note, I'm not really advocating for this, just pointing out that neither Rate nor Counter cover all use cases.

Why do we have failures and successes as separate metrics?

The obvious critique of the above suggestion is to say that http_reqs_success is unnecessary because it's opposite of http_reqs_failure. This is true, but some outputs don't allow to define logic, and therefore it's not possible to show http_reqs_success unless k6 itself produces it.

Once the metric filtering feature is developed, I would suggest we exclude http_reqs_success by default.

http_req_duration and other httpreq* metrics.

The core requirement of this feature is to be able to see http_req_duration for successful requests only.

There are two possibilities here:

Don't emit http_req_duration for failures
Tag http_req_duration with failed:true|false tag and display filtered values.

Let's discuss both approaches

Don't emit http Trend metrics for failed requests

In this approach, http_req_duration and other http metrics won't include failed requests towards the metric's internal state.

Users who want to track error timings can define custom metrics like this:

  let http_4xx = new Trend('http_4xx');

  let res = http.get('http://test.k6.io');
  if(res.status > 400 && res.status <= 499){
    http4xx.add(res.timings.http_req_duration);
  }

Tag `http_req_duration` with `failed:true|false` tag and display filtered values.

With this approach, we would emit the http_req_duration and friends as we used to, but we will tag values with failed:true|false

The default-end-of-test summary would display only successful requests like this:

http_req_duration{failed:false}...: avg=132.76ms min=127.19ms med=132.76ms max=138.33ms p(90)=137.22ms 
http_reqs........................: 3      3.076683/s
http_reqs_failure................: 2      2.036683/s
http_reqs_success................: 1      1.036683/s
iteration_duration...............: avg=932.4ms  min=932.4ms  med=932.4ms  max=932.4ms  p(90)=932.4ms  p(95)=932.4ms 
iterations.......................: 1      1.038341/s

The most problematic issue with this approach is that some outputs don't ingest tags and won't be able to display http_req_duration for successful requests only.

Examples of http_req_duration and http_reqs_failure k6 should produce with this approach.

{
  "type": "Point",
  "metric": "http_req_duration",
  "data": {
    "time": "2021-01-22T12:40:08.277031832+01:00",
    "value": 0.032868,
    "tags": {
      "error_code": "1501",
      "group": "",
      "method": "GET",
      "name": "https://httpbin.test.k6.io/status/501",
      "proto": "HTTP/2.0",
      "scenario": "default",
      "status": "501",
      "tls_version": "tls1.2",
      "url": "https://httpbin.test.k6.io/status/501",
      "failed": true,  // see reasoning below in the "cloud support" section
    }
  }
}

{
  "type": "Point",
  "metric": "http_reqs_failure",
  "data": {
    "time": "2021-01-22T12:40:08.277031832+01:00",
    "value": 1,
    "tags": {
      // same tags as for http_req_duration
      "error_code": "1501",
      "status": "501",
      "name": "https://httpbin.test.k6.io/status/501",
      "group": "",
      "method": "GET",
      "proto": "HTTP/2.0",
      "scenario": "default",
      "tls_version": "tls1.2",
    }
  }
}

Cloud support

There are additional considerations for the k6 cloud support.

Performance insights

The "performance insights" feature and web app currently assume that successful requests have status 200-399.

The "URL table" displays statuses >=400 with red background
There are several performance alerts that show up when there are sufficiently many requests with statuses >=400

Both approaches listed above solve these problems, although in different ways.

In approach 1, we would only get timings for successful requests and therefore we won't show timings for failed requests. We will still get tagged http_reqs_failure metrics and therefore will be able to show errors without timings. We would probably redesign this UI to separate failures from successes in a better way.

In approach 2, we would get a new standard tag called failed to all http_req_* metrics, including http_req_li_all. Timings would still be shown for errors (although probably not useful), but the background of the row would be determined by the failed tag.

  {
    "type": "Points",
    "metric": "http_req_li_all",
    "data": {
      "time": "1604394111659104",
      "type": "counter",
      "tags": {
        "tls_version": "tls1.2",
        "group": "",
        "scenario": "default",
        "url": "https://test.k6.io",
        "name": "https://test.k6.io",
        "method": "GET",
        "status": "200",
        "proto": "HTTP/2.0",
        "failed": false
      },
      "values": {
        "http_req_waiting": 123.88875,
        "http_req_receiving": 0.215741,
        "http_req_duration": 124.419757,
        "http_req_blocked": 432.893314,
        "http_req_connecting": 122.01245,
        "http_req_tls_handshaking": 278.872101,
        "http_req_sending": 0.315266,
        "http_reqs": 10,
        "http_reqs_success": 10
      }
    }
  },
  {
    "type": "Points",
    "metric": "http_req_li_all",
    "data": {
      "time": "1604394111659104",
      "type": "counter",
      "tags": {
        "tls_version": "tls1.2",
        "group": "",
        "scenario": "default",
        "url": "https://test.k6.io",
        "name": "https://test.k6.io",
        "method": "GET",
        "status": "200",
        "proto": "HTTP/2.0",
        "failed": true
      },
      "values": {
        "http_req_waiting": 23.88875,
        "http_req_receiving": 0.215741,
        "http_req_duration": 24.419757,
        "http_req_blocked": 32.893314,
        "http_req_connecting": 22.01245,
        "http_req_tls_handshaking": 78.872101,
        "http_req_sending": 0.315266,
        "http_reqs": 10,
        "http_reqs_failure": 10
      }
    }
  }

(alternative) Why don't we skip the new metrics and purely rely on `failed` tag?

It's possible to extend the existing http_reqs counter metric by tagging requests with failed and changing the metric type to Rate. If that's done, the following script would be possible,

import { sleep } from 'k6'
import http from 'k6/http'

export let options = {
  thresholds: {
    'http_reqs{failed:true}': ['rate < 0.1'],
  }
};

export default function() {
 let response_success  = http.get("https://httpbin.test.k6.io/status/200");
 let response_failure1 = http.get("https://httpbin.test.k6.io/status/503");
 let response_failure2 = http.get("https://httpbin.test.k6.io/status/503");
}

Possible end-of-test summary:

http_reqs...................: 3      ✓ 1 ✗ 2     3.134432/s
http_reqs{failed:true}..: 33.33% ✓ 1 ✗ 2     1.034432/s
http_reqs{failed:false}.: 66.66% ✓ 2 ✗ 1     2.104432/s

Possible problems with this approach are:

some outputs don't ingest tags. It would not be possible to use this functionality with statsd
http_reqs would be backwards incompatible unless we combine rate and counter into a new metric type.
some "special case" handling would be required for displaying.

I'm (currently) against this alternative.

Defining failure

To determine if a request has failed or succeeded, a JavaScript hook function is invoked after the request, but before the metrics emission. This proposal builds on https://github.com/loadimpact/k6/issues/1716

import http from 'k6/http'

export let options = {
  hooks: {
    http: {
      successHook: 'myHttpSuccessHook',
    }
  }
};

export function myHttpSuccessHook(response){
  // returns boolean true|false
  // adds failed = true|false tag
  // decides if the metric goes into http_req_duration.
  // default implementation: return response.status >= 200 && response.status <= 399
  return response.status >= 200 && response.status <= 204
}

export default function() {
 let response_success  = http.get("https://httpbin.test.k6.io/status/200");
 let response_failure1 = http.get("https://httpbin.test.k6.io/status/503");
 let response_failure2 = http.get("https://httpbin.test.k6.io/status/503");
}

per-request handling

Sometimes users need to handle special cases.

Alternative 1 - handle inside the hook

import http from 'k6/http'

export let options = {
  hooks: {
    http: {
      successHook: 'myHttpSuccessHook',
    }
  }
};

export function myHttpSuccessHook(response){
  if(response.request.name === 'https://httpbin.test.k6.io/status/403'){
    return response.status === 403 // expecting 403 for this specific URL
  }
  return response.status >= 200 && response.status <= 204
}

export default function() {
 let response_success  = http.get("https://httpbin.test.k6.io/status/200");
 let response_failure1 = http.get("https://httpbin.test.k6.io/status/503");
 let response_failure2 = http.get("https://httpbin.test.k6.io/status/503");
}

Alternative 2 - override the hook per request

import { sleep } from 'k6'
import { Rate } from 'k6/metrics'
import http from 'k6/http'

export default function() {
 let response_failure1 = http.get("https://httpbin.test.k6.io/status/503");
 let response_failure2 = http.get("https://httpbin.test.k6.io/status/503");
 let response_success  = http.get("https://httpbin.test.k6.io/status/403", {
  successHook: (r) => r.status===403
 });
}

What about the redirect chains?

This is up for discussion.

import { sleep } from 'k6'
import { Rate } from 'k6/metrics'
import http from 'k6/http'

export default function() {
 let response_success  = http.get("http://httpbin.test.k6.io/absolute-redirect/5", {
  successHook: (r) => r.status===200
 });
}

Should the hook fire on every request in the chain or only on the last one?

Concerns

the performance penalty of executing the js hook function on every request.
the performance penalty of adding more data to http_req_li_all and other http_req_* metrics.

ppcano commented 3 years ago

I think the metricEmissionHooks option is unnecessary.


 import http from 'k6/http'

export let options = {
  metricEmissionHooks: {
    http: {
      successHook: 'myHttpSuccessHook',
    }
  }
};

export function myHttpSuccessHook(response){
  return response.status >= 200 && response.status <= 204
}

export default function() {
  http.get("https://httpbin.test.k6.io/status/200");
}

Instead, we can define a name convention for the hook method. For example,didSuccessRequest.


 import http from 'k6/http'

export function didSuccessRequest(response){
    return response.status >= 200 && response.status <= 204
}

export default function() {
    http.get("https://httpbin.test.k6.io/status/200");
}

simskij commented 3 years ago

I'm not that fond of metricEmissionHooks either, but I'm equally skeptic against using naming conventions to define behavior.

I'd much rather see something along the lines of:

export let options = {
  hooks: {
    http: {
      after: 'myHttpSuccessHook' // after might be a bad name, but you get the gist.
    }
  } 
};

(without having to quote the name, if that's even possible, but I have some vague memory of the interop layer requiring it to be a string?)

The issue with these "named exports", is that we really can't guard against errors in a good way.

If the user were to write didSuccesRequest, it wouldn't work and it would be quite hard to spot. If we put it in the options, at least we'd be able to set up types for it, gently pushing the user towards the right syntax.

I've had issues myself more than once due to the current types being too permissive, not warning when you write threshold instead of thresholds for instance.

mstoykov commented 3 years ago

I am definetely for it not being 1 more magically named one. I was against the handleSummary just being magically named one instead of configurable.

Also arguably it will be nice if you can configure different ones for different scenarios for example.

(without having to quote the name, if that's even possible, but I have some vague memory of the interop layer requiring it to be a string?)

The "options" ultimately need to end up as a JSON and be transported around as such (and to not be recalculated) which means that w/e you have in it needs to be able to get to JSON and back in a way that we can figure out what you meant. We can try to do "magical" stuff so when you wrie a name of a function we automatically figure it out(no idea how, and probably will be finicky), but then writing just the function definition there won't work, which is strange ... so, yeah - we need it to be a string.

ppcano commented 3 years ago

Also arguably it will be nice if you can configure different ones for different scenarios for example

Do we really want to provide this?

If you allow the user to set a callback, they will likely have enough flexibility to change the logic based on the current scenario execution. In this case, it is not compulsory to serialize the string name.

1 more magically named one

What you call a "magically named", is just a documented API for me.

@simskij provided another alternative something like http.default.setSuccessHook(callback)

IMO, this type of dynamic string execution looks not very intuitive and error-prone. cc @legander

imiric commented 3 years ago

I'm quite fond of the tagging approach since it avoids adding additional metrics for each request, which are costly to process as it is. After https://github.com/loadimpact/k6/issues/1321 is implemented this might not be as important, but we should still try to minimize adding new metrics if we can avoid it.

Are we sure that statsd doesn't support tags still? It seems tag support was added last year in https://github.com/statsd/statsd/pull/697, released in v0.9.0. If this was the only output that didn't support them, and if we're moving towards supporting OpenMetrics as a generic output (which does support tags from what I've seen), then it might be worth it to consider the tag approach after all.

The performance penalty of executing the JS hook on each request is certainly a concern. Are we sure we need these checks to be dynamic? If status codes are the only metric for determining this, maybe we can avoid the hooks and allow configuring them with something like:

export let options = {
  http: {
    success: {
      status: ['>=200', '<400'],
    }
  }
}

Though I'm not sure if this would perform much better.

OTOH if we need the flexibility for custom failure states determined by headers or something else in the response, then I guess we need the hook based approach. I'm ambivalent about the syntax, and we can certainly refine it later, though I'd also like to avoid using naming conventions for it.

na-- commented 3 years ago

A few short comments to start:

Rate will be strictly superior to Count, once we have https://github.com/loadimpact/k6/issues/1312, so my vote is for that
http_req_failures seems better than http_reqs_failure
About the hook: I am very much against a global exported function with a special name for a hook; a string function in the global options is only slightly better... :disappointed:
@simskij's / @ppcano's example above of having a function in the HTTP module to set the hook callback (http.default.setSuccessHook(callback)) seems to be the best proposal to me, though we need to evaluate if we can do it on a per-VU level so we can have different hooks for different VUs/scenarios.
Also, I'd prefer that, if we have a hook/callback, it actually be capable of doing more things than determine if the request is "successful"... There are way more use cases for such a hook, so I'd prefer we don't tie our hands with just it returning true/false and being incapable of anything else...

So, to step back the details and speak in more general terms... This is a big feature that, even if we agree on everything (and I have plenty of objections :sweat_smile: ), is unlikely to land in k6 v0.31.0. But given that this feature completely depends on the hook/callback (https://github.com/loadimpact/k6/issues/1716), and that the hook is actually the more complicated part, I think we should focus solely on that as the MVP for v0.31.0. I am not sure we'll manage to do even that for v0.31.0, but if we reach a consensus, I think for sure it's the place we should focus our energies first.

And I also mean the "viable" part in "MVP" :sweat_smile: If the hook is a bit more generic, everything else here might be completely unnecessary... If we have a generic post-request hook, it can be used for the following three things:

have all of the possible information to determine if a request is a "failure" or not - status, headers, even request body
suppress metrics from some HTTP requests from being emitted (i.e. do not pollute http_req_duration with failures)
add extra tags to the metrics (e.g. add failure: true|false to the metrics, one of the suggested approaches above)
emit custom metrics (e.g. emit http_req_failures for failed requests, the other suggested approach above)
do anything else our users might want...

So, it gives us the most flexibility. And for k6 v0.32.0, when we have had a bit of time to carefully consider things, we can simply set the default such callback in Go to something that we think is reasonable. For example, "do not emit http_req_* metrics for status >= 400 and emit a http_req_failures Rate instead". Or anything else. And then encode that in the cloud, if we want to have smarter results.

Example code:

import http from 'k6/http';
import { Rate } from 'k6/metrics'

let requestFailures = new Rate('http_req_failures')

export let options = {
    thresholds: {
        http_req_failures: ['rate < 0.1'],
    },
};

http.setResponseCallback(function (response, metricsObject) {
    let requestFailed = response.status >= 399 || response.status === 0;
    requestFailures.add(requestFailed);
    if (requestFailed) {
        metricsObject.doNotEmitMetrics(); // not a final API, just illustrative :D
    }
})

export default function () {
    http.get("https://httpbin.test.k6.io/status/200");
    http.get("https://httpbin.test.k6.io/status/503");
    http.get("https://httpbin.test.k6.io/status/503");
    http.get("https://httpbin.test.k6.io/status/403");
}

simskij commented 3 years ago

@na--

I share your views! Only thing I'd like to expand on is that I think it would make sense to be able to define both default and per-call callbacks. As in:


import { skipCallback } from 'http'

http.setDefaultResponseCallback(function (response, metricsObject) {
    let requestFailed = response.status >= 399 || response.status === 0;
    requestFailures.add(requestFailed);
    if (requestFailed) {
        metricsObject.doNotEmitMetrics(); // not a final API, just illustrative :D
    }
})

export default function () {
    http.get("https://httpbin.test.k6.io/status/200"); // Uses the one defined above
    http.get("https://httpbin.test.k6.io/status/503"); // Uses the one defined above
    http.get("https://httpbin.test.k6.io/status/503", { responseCallback: skipCallback }); // Overrides the one above with noop
    http.get("https://httpbin.test.k6.io/status/403", { responseCallback: (res) => { /* ... /*} }); // Overrides the one above with something else
}

Most calls should not have a callback -> per-call
Most calls should have a callback -> default + skips
Most calls should have a callback, but some should have different ones -> default + per-call + skips
All calls should have a callback -> default
All calls should have a callback, but some should have different ones -> default + per-call

While switch cases and if's in the default callback sure is an option here, I'd argue that being able to do this per invocation makes sense, given that the URLs and other characteristics of the request might be dynamic, but the invocations themselves likely will be following a pattern you can reason about.

legander commented 3 years ago

I like the solution @na-- posted, it seems to provide most bang for the buck! (although initially requiring some setup on the user end), having a nice default (which is also brought up) would be nice. Perhaps the naming should indicate more clearly that the responseCallback is being processed before metrics are emitted? Like preResponseCallback or responseInterceptor idk. Could also be solved by docs I guess 🤷‍♂️

na-- commented 3 years ago

:+1: for per-request callbacks. Not sure if having a special skipCallback thing in the http module is better than just null or function() {}, but those are implementation details we can discuss in https://github.com/loadimpact/k6/issues/1716

:+1: for a better name as well, my opinion here is :+1: for preResponseCallback, :-1: for responseInterceptor :smile: another possible suggestion postRequestCallback or postRequestHook, with a slight preference for the hook variant

sniku commented 3 years ago

There are many good things with a generic callback that can manipulate all metrics, but there are bad things here as well:

we lose the "standard" metric name for http_req_failures because the user can define whatever name they want, and cloud won't know the meaning of it.
It's more complex to implement for the users. User is now responsible for emitting metrics, tagging other metrics and sometimes preventing emission. This code looks like library code more than client-side code.
The implementation of a generic hook that can do anything makes it easy for the user to shoot themselves in a foot.

I think we should have a generic hook as well, but I don't think it's required for this feature. We can have many hooks for many different purposes.

legander commented 3 years ago

@sniku Isn't it specifically what is being mentioned here or did I misinterpret that?

And for k6 v0.32.0, when we have had a bit of time to carefully consider things, we can simply set the default such callback > in Go to something that we think is reasonable. For example, "do not emit httpreq* metrics for status >= 400 and emit a >http_req_failures Rate instead". Or anything else. And then encode that in the cloud, if we want to have smarter results.

ppcano commented 3 years ago

Agree with @sniku . IMO, one of the main advantages of this feature is to define a threshold easily on the error rate.

import http from 'k6/http';

export let options = {
    thresholds: {
        http_req_failures: ['rate < 0.1'],
        // or 
        'http_reqs{failed:true}': ['rate < 0.1'],
    },
};

export default function () {
    http.get("https://httpbin.test.k6.io/status/200");
    http.get("https://httpbin.test.k6.io/status/503");
    http.get("https://httpbin.test.k6.io/status/503");
    http.get("https://httpbin.test.k6.io/status/403");
}

na-- commented 3 years ago

Yes, what @legander said :smile: And, I think we've discussed this before, but as an API design principle, if you can choose between:

having a bunch of composable components (e.g. "users can intercept and change HTTP responses and emitted metrics with custom hooks") and a good and simple default on top of them (e.g "by default, we emit http_req_failures and do not emit request duration for failed metrics") that is good enough for the majority of people, but can be changed
having a single inflexible solution for the problem the majority of users have (e.g. "we will always emit http_req_failures, but we'll allow users to tweak which requests are considered failure")

In my mind, the first is always better, since it delivers the same benefits but covers corner cases and alternative use cases much, much better. A huge percentage of the open k6 issues and existing technical debt are because we've chosen the second option far too often. The whole k6/http module is one huge example of it, in fact...

For example, the majority of people probably want http_req_failures, but some might prefer tagged metrics with failed: true when they are emitting them to an external output that is not our cloud. With my suggestion, both use cases would be covered, whereas with what you propose, only a single one will be and we'll have a bunch of open issues we'll never solve and a bunch of PRs we'll never merge :wink:

@ppcano, please explain how my suggestion precludes easily defining a threshold on the error rate?

ppcano commented 3 years ago

import http from 'k6/http';
import { Rate } from 'k6/metrics'

let requestFailures = new Rate('http_req_failures')

export let options = {
    thresholds: {
        http_req_failures: ['rate < 0.1'],
    },
};

http.setResponseCallback(function (response, metricsObject) {
    let requestFailed = response.status >= 399 || response.status === 0;
    requestFailures.add(requestFailed);
    if (requestFailed) {
        metricsObject.doNotEmitMetrics(); // not a final API, just illustrative :D
    }
})

export default function () {
    http.get("https://httpbin.test.k6.io/status/200");
    http.get("https://httpbin.test.k6.io/status/503");
    http.get("https://httpbin.test.k6.io/status/503");
    http.get("https://httpbin.test.k6.io/status/403");
}

@na-- I might have misunderstood your example, but it looks the user needs to create a Rate Metric, implement the hook and track the values.

It is a big difference in comparison with only:

export let options = {
    thresholds: {
        http_req_failures: ['rate < 0.1'],
    },
};

Also, it would be more difficult to set a threshold on the error rate with the Test Builder.

na-- commented 3 years ago

@ppcano , you didn't misunderstand my example, you probably just didn't read the text right above it... and when @legander copy-pasted it...

And for k6 v0.32.0, when we have had a bit of time to carefully consider things, we can simply set the default such callback in Go to something that we think is reasonable. For example, "do not emit http_req_* metrics for status >= 400 and emit a http_req_failures Rate instead". Or anything else. And then encode that in the cloud, if we want to have smarter results.

So, when we set such a default (be it k6 v0.32.0 or v0.31.0 if we manage it), unless the user has some specific requirements that would force them to define a custom post-request hook, setting a threshold on http_req_failures will look like this with my suggestion:

import http from 'k6/http';

export let options = {
    thresholds: {
        http_req_failures: ['rate < 0.1'],
    },
};

export default function () {
    http.get("https://httpbin.test.k6.io/status/200");
    http.get("https://httpbin.test.k6.io/status/503");
    http.get("https://httpbin.test.k6.io/status/503");
    http.get("https://httpbin.test.k6.io/status/403");
}

This is the whole script - the default post-request hook takes care of updating http_req_failures...

Btw, FWIW, having a default post-request hook also makes the breaking change much smaller. We all agree that a breaking change is necessary - the vast majority of people probably wouldn't want to count failed requests in http_req_duration and would benefit from http_req_failures. But for anyone who doesn't, it would be easy to opt out and keep the current k6 behavior by just doing something like this:

import http from 'k6/http';
http.setDefaultPostRequestHook(null);

And I think another good API design principle is that, when you're making a breaking change, you should strive to at least provide an easy solution for people who don't like it and want to go back to how things were... In this specific case, I imagine some people who have adopted their own solution for tracking HTTP errors might make use of it while they transition to our new hook-based approach.

na-- commented 3 years ago

btw I'm starting to dislike postRequestHook as the name, it might be confused with a hook for POST requests... :sweat_smile: So, changing my vote, I am not in favor of preResponseCallback or preResponseHook as the name.

na-- commented 3 years ago

I was re-reading the issue and I see I didn't address two parts of @sniku argument:

The implementation of a generic hook that can do anything makes it easy for the user to shoot themselves in a foot.

I don't accept the premise here, this can literally be an argument against almost any change. Especially everything k6 does above and beyond tools like wrk or ab... :stuck_out_tongue_winking_eye: We can discuss if a pre-response hook is something very unusual (I don't think it is), or if it violates POLA, but as long as the API and defaults are reasonably good, I can't think how giving our users more power and flexibility is bad.

I think we should have a generic hook as well, but I don't think it's required for this feature. We can have many hooks for many different purposes.

I strongly disagree on both counts. Besides thinking that it's a better solution for the specific problem we are discussing here, as I've expressed above, generic callbacks will solve plenty of other issues real k6 users have had. For example, this forum post linked in https://github.com/loadimpact/k6/issues/1716. Or https://github.com/loadimpact/k6/issues/927 / https://github.com/loadimpact/k6/pull/1171, https://github.com/loadimpact/k6/issues/800 (custom callback on the first HTTP digest request that has expected 400 response), https://github.com/loadimpact/k6/issues/884 (if we have http.setDefaultPostRequestHook() can have easily also have http.getCurrentDefaultPreResponseHook() and have a way to temporarily wrap the hook in another hook and then restore it), https://github.com/loadimpact/k6/issues/1298, this and this issues from the forum can be also solved by a per-request callback. And I'm probably missing some...

As to the "many hooks for many different purposes" - this will be quite difficult to do technically (please read through some of the objections in https://github.com/loadimpact/k6/issues/1716, specifically how it's already tricky when you have http.batch() in a single-threaded JS runtime). But I also think it's bad UX...

sniku commented 3 years ago

When writing this issue I explored the generic hook idea as well but I couldn't come up with API that was intuitive and didn't require the user to define/re-define the built-in http_req_failures metric.

I would really like to see a concrete proposal for the generic hook to solve this specific problem before making up my mind.

Here are some comments on the proposal from @na--

let requestFailures = new Rate('http_req_failures'); // redefinition of the built-in-metric should not be allowed like this (we have an issue for this)

http.setResponseCallback(function (response, metricsObject) {
    let requestFailed = response.status >= 399 || response.status === 0;
    requestFailures.add(requestFailed); // user responsible for manipulating a built-in metric feels wrong, but acceptable. Tags to this metric are also missing (status, url, etc)
    if (requestFailed) {
        metricsObject.doNotEmitMetrics(); // not convinced this is a good API :-) We do want to emit _some_ metrics.
    }
    // tagging of http_reqs is missing. I could not come up with a good suggestion for this. 
})

Even if we do come up with a nice implementation for the default hook, the user will have to copy-paste our entire default implementation just to change the status ranges (let's say 403 is acceptable by the user). It's not possible to extend the default implementation, it needs to be fully replaced by the user.

I like the http.setResponseCallback() API. That makes sense to me.

At the moment, I still think that boolean httpResponseSuccess() hook is easier for the user to understand and change. I think having more hooks is better UX than having one hook to rule them all, but I'm willing to be convinced by a nice API proposal.

na-- commented 3 years ago

I couldn't come up with API that was intuitive and didn't require the user to define/re-define the built-in http_req_failures metric.

I'll create an issue for a metric refactoring proposal, hopefully later today, for a prerequisite we need to implement https://github.com/loadimpact/k6/issues/1435, https://github.com/loadimpact/k6/issues/572, and https://github.com/loadimpact/k6/issues/1443 / https://github.com/loadimpact/k6/issues/961#issuecomment-557482834. In essence, it will make it so that it doesn't matter if the user has let requestFailures = new Rate('http_req_failures') in their script, or if we've defined it ourselves in the default preResponseCallback we have written in pure Go.

I would really like to see a concrete proposal for the generic hook to solve this specific problem before making up my mind.

Fair enough. I guess this is the next step for this issue, or maybe we should put this on hold and "go back" to https://github.com/loadimpact/k6/issues/1716 to discuss it.

Even if we do come up with a nice implementation for the default hook, the user will have to copy-paste our entire default implementation just to change the status ranges (let's say 403 is acceptable by the user). It's not possible to extend the default implementation, it needs to be fully replaced by the user.

I am not sure this is the case. It depends on how we design the API, of course, but I think it might be possible to make it reasonably composable, we'll see. And even if it's not possible, the default implementation I suggested above would be something like these whooping... 6 lines of code :smile:

let requestFailures = new Rate('http_req_failures')
http.setResponseCallback(function (response, metricsObject) {
    let requestFailed = response.status !== 403 && (response.status >= 399 || response.status === 0);
    requestFailures.add(requestFailed);
    if (requestFailed) {
        metricsObject.doNotEmitMetrics(); // not a final API, just illustrative :D
    }
})

Having to copy-paste them is not perfect, but it's reasonably compact and understandable and, most importantly, it lacks any magic at a distance... And we can easily write a jslib wrapper that just accepts the requestFailed condition as a lambda and deals with everything else, for people who want one-liners...

na-- commented 3 years ago

Sorry, @sniku, I completely missed the code comments you added. To respond:

redefinition of the built-in-metric should not be allowed like this (we have an issue for this)

I already discussed some of the current problems with metrics in that internal discussion we have, but I will elaborate more in the new k6 issue I plan to write today or on Monday about the metric refactoring that's a prerequisite for https://github.com/loadimpact/k6/issues/1435, https://github.com/loadimpact/k6/issues/572, and https://github.com/loadimpact/k6/issues/1443 / https://github.com/loadimpact/k6/issues/961#issuecomment-557482834. In summary, "creating" a new metric with the same name as a built-in system metric (e.g. new Rate('http_req_failures')) will just return the same metric (i.e. internally, a Go pointer to it) if the type matches, and throw an exception if it doesn't. https://github.com/loadimpact/k6/issues/1435 is the issue you probably meant.

user responsible for manipulating a built-in metric feels wrong, but acceptable. Tags to this metric are also missing (status, url, etc)

Also related to above, but the distinction between built-in and custom metric is quite artificial. And the API can be like this, Rate.add() accepts tags:

requestFailures.add(requestFailed, metricsObject.getTags());

not convinced this is a good API :-) We do want to emit some metrics. tagging of http_reqs is missing. I could not come up with a good suggestion for this.

I said it was illustrative, and I agree, it's not a good API. But what metrics exactly do you think we should emit? http_reqs?

sniku commented 3 years ago

In summary, "creating" a new metric with the same name as a built-in system metric (e.g. new Rate('http_req_failures')) will just return the same metric (i.e. internally, a Go pointer to it) if the type matches, and throw an exception if it doesn't. #1435 is the issue you probably meant.

If we must allow users to manipulate built-in metrics, wouldn't it be better to allow users to import them rather than redefine them?

import {http_req_duration} from 'k6/metrics/built-in';
http_req_duration.add(1);

Also related to above, but the distinction between built-in and custom metric is quite artificial.

I have a different opinion. Built-in metrics have some guarantees:

system tags are guaranteed to be present on built-in metrics.
users can't redefine the contains parameter.

k6 cloud depends on the first guarantee to work correctly.

I said it was illustrative, and I agree, it's not a good API. But what metrics exactly do you think we should emit? http_reqs?

I understand that it was a quick/initial mockup, but I really think we should get a reasonably final API proposal before deciding how to go forward. I sincerely doubt that the final implementation will be a "whooping... 6 lines of code".

I would like to see how the metricsObject is structured. Does it hold all built-in metrics for the request? (http_reqs, http_req_duration, http_req_blocked, ...)
Does k6 cloud get http_req_li_all inside of metricsObject while k6 run gets unagreggated metrics?
Why isn't http_req_failures inside of metricsObject but must be outside?
How can user tag http_reqs metric with a standard failed tag?
How can user prevent only http_req_duration+friends from being emitted, but allow http_reqs to be emitted?

The fact that we put the burden of all this on the user doesn't feel like good UX to me, but again, I would like to see a good API proposal before I agree/disagree :-)

na-- commented 3 years ago

I like the import of metrics, though that's not going to solve the issue of add()-ing metrics without all of the required tags. In that respect, import {http_req_duration} from 'k6/metrics/built-in' is exactly the same as new Trend('http_req_duration') returning the built-in metric, and the contains parameter won't be over-written in any case. To solve this issue, we probably need to expand https://github.com/loadimpact/k6/issues/1435 and validate that the required tags for the metrics are present.

I understand that it was a quick/initial mockup, but I really think we should get a reasonably final API proposal before deciding how to go forward.

Me too :smile: That was the whole reason for me pushing to define this task before we prioritize it... I guess now it's my responsibility to come up with this API, given that I dislike the original proposal and have suggested an alternative. I'll try to think about it some more and propose something over the next week, but here are some very preliminary answers to the questions you posed:

no idea yet, we probably won't even have a dedicated metricsObject as a second parameter; instead a parameter that's an object containing the response (which has the actual timings), something that controls what metrics will be emitted, as well as the system tags for the metrics.
there would be no differences between k6 cloud and k6 run, the callback will get called with exactly the same data; however, because of our hack with http_req_li_all for k6 run -o cloud, we have to figure out how to accommodate httpext.Trail and aggregation in a non-breaking way (:heart: the tech debt :smile:)
It might be, depending on exactly what the API ends up being. If it's a conceptual API working with metric types, it probably will be. If it's an API that directly deal with metric samples (i.e. values), it probably can't be inside, since it'd be a bit of a chicken and egg problem - how would the http_req_failures value already exist when we're in the callback to determine it :smile:
and 5. Not sure yet. I would have said by accessing the built-in metrics and sending values on them when suppressing the normal sending, but that would be too cumbersome. Will think about it some more.
(I realized that I'd forgotten to respond to your original question about redirect chains) Yes, we should fire any hook on every request in a redirect chain, otherwise we're going to have a bunch of problems.

Anyway, I'll let this percolate somewhat and hopefully get back to you sometime next week.

allansson commented 3 years ago

So I'm just thinking a bit of how to maintain a JavaScript feel to it, and my suggestion might be way of. But here goes.

As for the interface for manipulating the response event (or what you'd call it) I would go for something like the following:

const redirect = new Counter("redirect")

http.on("response", (ev) => {
  console.log("default metric: ", ev.metric.name); // whatever k6 thinks it should be

  if (ev.request.url === "https://test.k6.io") {
    // You go ahead and handle this for me!
    return 
  }

  const status = ev.response.status

  if (status >= 400) {
    // Change ev.metric to http_reqs_failure
    ev.fail()
  } else if (status >= 300) {
    redirect.add(1)
    ev.preventDefault() // tell k6 to ignore this ev
  } else {
    // Change ev.metric to http_reqs_success
    ev.succeed()
  }
})

The above just makes sense to me as a JavaScript-developer. It matches how NodeJS and browsers handle these kinds of events. For instance, in the browser anchor.addEventListener("click", ev => ev.preventDefault()) tells the browser to ignore handling the click (the on/off naming is node-like and much nicer than `addEventListener).

We could also have specific events for success/failure in case you know that you only want to modify failure into success or vice versa. Could improve performance a bit, by not being a catch-all.

// An evil API that has reversed error with success status
http.on("failure", ev => {
  ev.succeed()
})

http.on("success", ev => {
  ev.fail()
})

The list goes on what would be possible:

http.on("request", ev => {
    // I'm lazy and don't want to add this everywhere
    ev.request.headers.add("X-Request-Id", uuid())

    // Add tags for every request
    ev.addTags({

    })
})

I could go on and on. 😬

simskij commented 3 years ago

look at that, @allansson coming into the thread killing it with good suggestions! ;)

mstoykov commented 3 years ago

I agree with my proposal :). Thanks @na-- for bringing it up instead of I having to do it.

As for a compromise so that we don't have to have any JS callbacks for the first version (which likely will still not be in v0.31.0)

I propose we have an additional built-in callback that is exteremely "special" object that can't be used for anything else (at least for now). But you can register it as the "callback" of http.setResponseCallback or w/e we will end up calling it, as well as per request. This callback will be able to be used even after we start supporting JS ones and will be more performant. I actually meant to propose we have some built-in ones like this so that we don't impact performance as much for common cases, but now it seems like it will be good for other things as well ;).

My proposals for the one that should be included is:

http.setResponseCallback(http.expectedStatuses([200,299], [301,302], 308, 418);

The arguments are either an array of 2 ints or an int. If an int they are just the status code we expect, if they are an array they are the inclusive start and end of a range of codes we expect. This one callback will be easy to implement, is likely going to cover 95% of all use cases ever(famouse last words), does not stop us from having JS callbacks later, will mean that the performance for this simple one will likely be quite close to what it would be if we just hardcode it.

I would say that this way we at least won't need to figure out what the arguments to the JS callback need to be now. While still not leaving us with more technical debt, except having to support the built-in one, which I would've argued in the first place.

When we have JS callbacks the code will just check whether it gets an built-in callback or a not built in one. Also possibly the built-in ones will be able to be called with the same arguments as the JS ones and do what they usually do, possibly being used inside a JS callback - not really certain if that will really be all that useful, as at least for performance - calling in JS -> GO -> JS is probably going to be more work then just the ifs that will be saved :shrug:.

:+1: for different API @allansson , we actually internally discussed that this will likely be useful for before doing requests, especially in redirect chains. So it is likely that we will need to either have to have multiple callback regsiters, or have an 'on' one.

I am somewhat of the opinion that it will be better if this is with a tag "failed" then with a new metric, and then omitting the failed request metrics, as this way you won't know the performance of your failing requests(or you won't be able to mark them as failed). I think that it is fairly importatnt to know that your failed request also don't take forever and will is likely to be helpful when finding out what is going on when they are failing to know that they are failing in 10 seconds instead of 1 or that the tls handshake took 20 seconds.

I would also again like to bring everybodies attention to the fact that k6 supports not only HTTP, so when we start moving this to other protocols, and add protocols, adding additional metrics or having HTTP specific names is probably not going to be ... great.

To actually address @sniku concerns with this:

some outputs don't ingest tags. It would not be possible to use this functionality with statsd some "special case" handling would be required for displaying.|

This is also true for having more then 1 URL being hit by the script, as the URL is a tag again. So I am of the opinion that if tags aren't supported by an output, that output should probably not be recommended/supported.

http_reqs would be backwards incompatible unless we combine rate and counter into a new metric type.

http_reqs is still a Counter, it is just that it now has a tag "failed" and we have to have a way to print the number/percentage of failed/not failed in the terminal, but that doesn't make it a Rate. k6 will just need to make a Rate like thing/metric out of it when displaying

simskij commented 3 years ago

if they are an array they are the inclusive start and end of a range of codes we expect.

I'm strongly against this. Conditional logic in argument usage is... weird and reaaaally error-prone. Especially if you don't label it. If you really want to be able to mix entries with min-max style ranges, I'd definitely vote for having a separate object you could pass for that, with min and max as properties, like:

http.setResponseCallback(
  http.expectedStatuses(
    {min: 200, max: 299}, 
    308, 
    /* ... */
  ));

If it takes an array as well, that array should be considered a list of single status codes.

Other than that, I think having an optimized, predefined status comparator makes perfect sense.

sniku commented 3 years ago

I like almost all the ideas @MStoykov brought up here :+1: :+1: :+1:

:heart: for being the voice of reason and cutting down the scope while not compromising on future extendability.

I'm also for the slight API modification proposed by @simskij

http.setResponseCallback(http.expectedStatuses([{min: 200, max: 299}, {min: 301, max: 302}, 308, 418]);

It makes it less error-prone.

As far as I can tell, this proposal solves several issues:

it's going to be performant.
we don't need to implement a full-blown generic callback mechanism (which would likely delay the feature considerably)
it's user-friendly - it's easy to modify this callback without copy-pasting default callback code from k6
user can't shoot themselves (and us) in the foot by forgetting to tag or emit metrics.
it solves 95% of the problem and allows us to fix the remaining 5% later.

I also very much like the callback API proposed by @allansson :heart: but I think that proposal should be copy-pasted into https://github.com/loadimpact/k6/issues/1716 rather than be a part of this issue.

na-- commented 3 years ago

@allansson, you've given a lot of good suggestions, though they are things we probably won't be able to incorporate until we've started working on the new version of the k6 HTTP API, whenever that is. And we'd probably need event loops for most of them (https://github.com/loadimpact/k6/issues/882)... :disappointed: And we should consider the performance impact of multiple different callback types, especially if they stack, since that might significantly affect something like http.batch(). In any case, it's a bit too early for such a discussion.

Also a heavy :+1: from me for @MStoykov's suggestion with @simskij's modifications :tada: So, it seems like we have a consensus on this part :tada: :confetti_ball: Now only this bit remains:

I am somewhat of the opinion that it will be better if this is with a tag "failed" then with a new metric, and then omitting the failed request metrics, as this way you won't know the performance of your failing requests(or you won't be able to mark them as failed). I think that it is fairly importatnt to know that your failed request also don't take forever and will is likely to be helpful when finding out what is going on when they are failing to know that they are failing in 10 seconds instead of 1 or that the tls handshake took 20 seconds.

@MStoykov, I partially agree with you here. 60s timeouts and 10ms 503 errors would both be considered failed, but are completely different beasts. Hiding both behind a http_req_failures Rate metric would be terrible - actually worse than the current situation.

That said, if we use a failed tag on the normal HTTP metrics, we won't be able to specify thresholds like "failed requests should be less than 1%". And unless we implement https://github.com/loadimpact/k6/issues/1321 first (or define bogus thresholds to expose the sub-metrics, like these workarounds: https://github.com/loadimpact/k6-docs/issues/205), we won't be able to see this breakdown in the end-of-test summary... So, even if we both have failed:true/false as a metric tag on the old http_req_* metrics AND we have a new http_req_failures Rate metric, we'd still have a shitty and insufficient UX... :disappointed:

mstoykov commented 3 years ago

An internal discussion later, the consensus is that there will be both a tag and a metric:

The tag will be named "passed" as http_req_duration{failed:false} is a double negative in order to get the duration of successful request.
The metric will be a Rate named "http_req_failed" so it's easier to do thresholds.

In the (far) future when k6 has support for disabling metrics and having thresholds supporting something like http_reqs{passed:true}.count/http_reqs.count>0.99 , we will probably disable http_req_failed in order to reduce the noisiness of metrics that k6 emit while still having a way to make a threshold on the rate of the "failing" requests.

mstoykov commented 3 years ago

As per discussion with @sniku we agreed upon the output by default being changed to look like:

data_received..........................: 68 MB   6.4 MB/s
data_sent..............................: 173 kB  16 kB/s
http_req_blocked{passed:true}..........: avg=19.64ms  min=281ns    med=1.19µs   max=274.96ms p(90)=2.17µs   p(95)=267.61ms
http_req_connecting{passed:true}.......: avg=4.46ms   min=0s       med=0s       max=76.85ms  p(90)=0s       p(95)=43.37ms 
http_req_duration{passed:true}.........: avg=416.69ms min=77.59ms  med=357.13ms max=890.05ms p(90)=706.43ms p(95)=753.02ms
http_req_receiving{passed:true}........: avg=232.02ms min=8.41ms   med=172.71ms max=693.12ms p(90)=546.14ms p(95)=571.89ms
http_req_sending{passed:true}..........: avg=98.18µs  min=21.65µs  med=78.07µs  max=1.68ms   p(90)=124.79µs p(95)=143.14µs
http_req_tls_handshaking{passed:true}..: avg=10.57ms  min=0s       med=0s       max=162.9ms  p(90)=0s       p(95)=134.05ms
http_req_waiting{passed:true}..........: avg=184.56ms min=43.51ms  med=170.59ms max=446.44ms p(90)=262.49ms p(95)=289.31ms
http_reqs..............................: 275     26.133352/s
 { passed:true }.......................: 175     16.133352/s
 { passed:false }......................: 100     10.000000/s
iteration_duration.....................: avg=2.27s    min=1.54s    med=2.31s    max=2.94s    p(90)=2.57s    p(95)=2.68s   
iterations.............................: 83      7.887521/s
vus....................................: 20      min=20 max=20
vus_max................................: 20      min=20 max=20

which is basically the expanded version of Tag http_req_duration with failed:true|false tag and display filtered values. of the original comment

na-- commented 3 years ago

:-1:, or rather, :-1: for k6 v0.31.0, :+1: for first implementing https://github.com/loadimpact/k6/issues/1832 and https://github.com/loadimpact/k6/issues/1321, on which this change depends on, and showing a version of that as the default summary in future k6 versions (e.g. v0.32.0)

mstoykov commented 3 years ago

I think that :

1321 will push this past v0.32.0, probably even past v0.33.0 unless #1321 turns a lot easier than I expect. And I also don't think we have any consensus on how #1321 should be done.
1832 seems unrelated to the changes at hand to me, although is important to be done at some point as it is needed by other PRs working with metrics
Arguably a prerequisite question is "if we all agree that we should only show "passed" httpreq* metrics in the k6 summary by default? "

I agree that this is a better experience. I am not certain that we shouldn't as well have an easy way to show the time for the failing ones as well ... but arguably this is just one more summaryHandle helper in jslib.

If we agree on 3. (and because summaryHandle can't tell that it needs http_req_blocked{passed:true} for example). Then the summary definitely needs a way to get all the http_req*{passed:true} and:

       for _, name := range []string{
               "http_reqs{passed:true}",
               "http_req_duration{passed:true}",
               "http_req_blocked{passed:true}",
               "http_req_connecting{passed:true}",
               "http_req_tls_handshaking{passed:true}",
               "http_req_sending{passed:true}",
               "http_req_waiting{passed:true}",
               "http_req_receiving{passed:true}",
       } {
               if _, ok := e.thresholds[name]; ok {
                       continue
               }
               parent, sm := stats.NewSubmetric(name)
               e.submetrics[parent] = append(e.submetrics[parent], sm)
       }

after https://github.com/loadimpact/k6/blob/11811171a3c416daf17f7f0ad182ca30aa9990bb/core/engine.go#L100-L108 seems to do the trick, and arguably it will be something kind of similar with #1321 .

Arguably: Doesn't look that terrible, but I would argue that makes #1319 more relevant ;), so I am not for just adding the above code and leaving it.

To me, the biggest problem (if we agree on 3.) seems to be #1806, which either needs to be done now or will need to redo the changes when we get to it, although arguably the changes will not be all that complicated so :man_shrugging:

na-- commented 3 years ago

Let's backtrack a little bit. I haven't looked at the code in https://github.com/loadimpact/k6/pull/1856 yet, but it seems like it fulfills everything we all previously agreed on (https://github.com/loadimpact/k6/issues/1828#issuecomment-772581756), right? If so, given that there is a very easy workaround to surface specific sub-metrics in the end-of-test summary (https://github.com/k6io/docs/issues/205), I think the PR covers way more than 80% of the goals for this issue for less than 20% of the effort:

users would now set a threshold for the percentage of "failed" requests (arguably the most important missing functionality until now)
metrics will be properly tagged, so users will be able to set thresholds for the percentile duration of non-failed requests (second? most important missing feature) and if they set such a threshold, users will see the http_req_duration{passed:true} values in the summary
even if they have not set such a threshold, they can very easily set a dummy one (https://github.com/k6io/docs/issues/205) to surface these values in the summary

Again, this seems plenty good enough for such a contentious set of features, for now, i.e. for k6 v0.31.0... I am for continuing with things in this priority order: https://github.com/loadimpact/k6/issues/1832, followed by https://github.com/loadimpact/k6/issues/1806, followed by https://github.com/loadimpact/k6/issues/1321, followed by changing the summary. And this changing of the summary should include showing http_req_duration{passed:true}, but also hiding most of the other http_req_* metrics except a new composite one that's the sum of http_req_connecting + http_req_tls_handshaking, as we've previously discussed.

If we decide that we're not going to release any of these things in k6 v0.31.0, but we start working on them now in preparation for k6 v0.32.0, I think there's plenty of time to get them done until the end of April...

I am mostly against hardcoding a bunch of cruft in the Engine, just so we can get the remaining 5% of this feature that users can currently get with a small workaround (and which we plan to fix as a part of https://github.com/loadimpact/k6/issues/1321). And even if I am outvoted for that, I am definitely and very firmly against adding a {passed:true} submetric by default for every http_req_* metric that exists... I don't see a point in showing submetrics for metrics we plan to soon hide by default... So, anything beyond the single http_req_duration{passed:true} would take a lot of convincing for someone to "sell" it to me.

Remember, besides the needless clutter in the end-of-test summary, until we have https://github.com/loadimpact/k6/issues/763, every Trend sub-metric is costing us quite a lot of memory and CPU for long-running or metric-heavy tests. We are currently storing in memory all of the values for every Trend metric, for the whole duration of the test, and sorting them every few seconds, just to be able to calculate the percentiles for the summary and threshold calculations... So, I am very firmly against basically doubling this built-in memory leak and CPU burden...

sniku commented 3 years ago

As @na-- says, the #1856 PR goes a long way towards fulfilling this feature request. Still, even if #1856 is merged, the second basic requirement of this feature is not fulfilled.

show response time for successful requests. (without adding boilerplate code)

Not getting http_req_duration for successful requests without adding boilerplate code is difficult to accept. I understand that it's possible to define a dummy threshold to get the duration in the output, but that's not in line with providing good user experience by default.

Again, this seems plenty good enough for such a contentious set of features, for now, i.e. for k6 v0.31.0... I am for continuing with things in this priority order: #1832, followed by #1806, followed by #1321, followed by changing the summary.

If I'm reading this sentence correctly, you are proposing to skip showing http_req_duration for successful requests until all 3 of these issues are implemented?

So, I am very firmly against basically doubling this built-in memory leak and CPU burden...

I think we all firmly agree on this. We don't want to double CPU and memory burden.

While I'm personally not against the "clutter" in see output, I see your point. I agree that adding {passed:true} to all minor http_req_* metrics is unnecessary, especially if it considerably increases the CPU/memory consumption.

I'm suggesting to display tagged values only for http_req_duration and possibly http_reqs. The latter is less important to me.

The proposed output would look similar to this:

data_received.............: 68 MB   6.4 MB/s
data_sent.................: 173 kB  16 kB/s
http_req_blocked..........: avg=19.64ms  min=281ns    med=1.19µs   max=274.96ms p(90)=2.17µs   p(95)=267.61ms
http_req_connecting.......: avg=4.46ms   min=0s       med=0s       max=76.85ms  p(90)=0s       p(95)=43.37ms 
http_req_duration.........: avg=416.69ms min=77.59ms  med=357.13ms max=890.05ms p(90)=706.43ms p(95)=753.02ms
  {passed:true}...........: avg=216.69ms min=77.59ms  med=357.13ms max=890.05ms p(90)=706.43ms p(95)=753.02ms
http_req_receiving........: avg=232.02ms min=8.41ms   med=172.71ms max=693.12ms p(90)=546.14ms p(95)=571.89ms
http_req_sending..........: avg=98.18µs  min=21.65µs  med=78.07µs  max=1.68ms   p(90)=124.79µs p(95)=143.14µs
http_req_tls_handshaking..: avg=10.57ms  min=0s       med=0s       max=162.9ms  p(90)=0s       p(95)=134.05ms
http_req_waiting..........: avg=184.56ms min=43.51ms  med=170.59ms max=446.44ms p(90)=262.49ms p(95)=289.31ms
http_reqs.................: 275     26.133352/s
 { passed:true }..........: 175     16.133352/s
 { passed:false }.........: 100     10.000000/s
http_reqs_failed..........: 33.33%    ✓ 1     ✗ 2
iteration_duration........: avg=2.27s    min=1.54s    med=2.31s    max=2.94s    p(90)=2.57s    p(95)=2.68s   
iterations................: 83      7.887521/s
vus.......................: 20      min=20 max=20
vus_max...................: 20      min=20 max=20

na-- commented 3 years ago

If I'm reading this sentence correctly, you are proposing to skip showing http_req_duration for successful requests until all 3 of these issues are implemented?

Yes, that was my proposal. And I think it's realistic to target k6 v0.32.0 for all of that, if we put these 4 issues on the priority track and start working on them now.

The rationale is that if people have a threshold on http_req_duration{passed:true}, which is a good idea in general, they will see it ins stats the summary in k6 v0.31.0. And if they don't have such a threshold, they can easily add a dummy one and see it, until we add it by default. Another reason for waiting until we have implemented the metric whitelisting is that if we don't, users would see (and k6 would calculate) the stats for both http_req_duration and http_req_duration{passed:true}, and the first one is a bit useless.

All of that said, I am not very opposed to a temporary hack to show http_req_duration{passed:true} if we don't touch any of the other metrics...

I'm suggesting to display tagged values only for http_req_duration and possibly http_reqs. The latter is less important to me.

Why http_reqs though? You literally have that information right below in your example, via http_reqs_failed:

http_reqs_failed..........: 33.33%    ✓ 1     ✗ 2

na-- commented 3 years ago

I think the end state of this effort, after #1832, #1806, #1321, and the summary changes have been implemented, should look like this:

data_received..........................: 68 MB   6.4 MB/s
data_sent..............................: 173 kB  16 kB/s
http_req_conn_duration{passed:true}....: avg=4.46ms   min=0s       med=0s       max=76.85ms  p(90)=0s       p(95)=43.37ms    count=900
http_req_conn_duration{passed:false}...: avg=9.12s    min=0s       med=0s       max=60.00s   p(90)=50s      p(95)=55.55s     count=100
http_req_duration{passed:true}.........: avg=416.69ms min=77.59ms  med=357.13ms max=890.05ms p(90)=706.43ms p(95)=753.02ms   count=900
http_reqs_failed.......................: 10.00%    ✓ 900     ✗ 100
iteration_duration.....................: avg=2.27s    min=1.54s    med=2.31s    max=2.94s    p(90)=2.57s    p(95)=2.68s      count=2000

Where http_req_conn_duration is a new composite the sum of http_req_connecting , http_req_tls_handshaking (and probably also DNS lookup, since it should be relatively easy to add that back in now). Also notice the count column now shown by default for Trend metrics.

na-- commented 3 years ago

We had a meeting and the consensus is that we'll go with a small hack for showing http_req_duration{passed:true} by default for k6 v0.31.0, but not mess with the other metrics, like this: CqIgqug

We are still evaluating the actual names for the passed:true/false metric tag and the http_req_failed metric, so if anyone has better alternative suggestion, feel free to share.

Also, we should remember that HTTP digest and NTML requests expect 401 responses, so they shouldn't be treated as failures.

mstoykov commented 3 years ago

Also, we should remember that HTTP digest and NTML requests expect 401 responses, so they shouldn't be treated as failures.

Do you mean that the first 401 should always be treated as a success for digest/ntlm requests? Effectively bypassing the callback?

na-- commented 3 years ago

I haven't looked at the PR code yet, so I'm not sure what you mean by "effectively bypassing the callback", but yes, the first HTTP request in a digest/NTLM auth scenario should expect a 401 response and not treat that as a failure.

sniku commented 3 years ago

This is out of scope for this feature request, but we will likely need to add grpc_req_duration{ passed: true } and grpc_req_failure metric to be consistent. I don't think we should risk delaying this feature by extending the scope, but it's worth reflecting on how the failure metric will be implemented for other protocols.

na-- commented 3 years ago

Or we could stop adding extra metrics for information we already have encoded as tags of existing metrics and fix the fundamental issues that prevent us from using them as intended? :roll_eyes: Adding an extra tag to metrics is mostly fine (whether that's passed or expected and whether we can do it as easily with gRPC streaming remains to be seen), but adding new metrics for every possible protocol to paper over some of the deficiencies of the current thresholds is unsustainable... So yes, I foresee us having another of these "fun" discussions in the future... :sob:

mstoykov commented 3 years ago

Some implementation notes and changes from https://github.com/loadimpact/k6/issues/1828#issuecomment-772581756:

the tag passed got renamed to expected_response
the output by default also includes http_req_duration{ expected_response:true }
http.setResponseCallback can take null which stops the emitting of the metric or the tag, effectively returning to the old behavior pre v0.31.0
all redirects also go through the responseCallback with:
- digest request always expecting to get 401 as their first response, before the authenticate
- ntlm request being okay with the first request they make (if redirects) to be a 401 on top of what their responseCallback expects

na-- commented 3 years ago

Shouldn't we close this issue now? 90% of the original issue is implemented, and the rest is hidden behind so much noise that a fresh issue would be a better place for it...

tom-miseur commented 3 years ago

The NTLM handshake actually consists of two HTTP 401s before a HTTP 200 is received (see https://nav7neeet.medium.com/maintaining-session-in-ntlm-authentication-3e8c604d87e9).

simskij commented 3 years ago

I agree with @na-- and will take the liberty of closing his issue. If anyone wants to continue any part of the conversation, please create a new issue that is (preferably) narrowly focused on that specific part.

(Also locking it to prevent additional input to get lost in the void and to promote the opening of new issues)

grafana / k6

Proposal for adding a new standard `http_req_failures` metric #1828

Basic requirements

Discussion about the metric type for http_reqs_failure

Output when using Counter metric

Output when using Rate metric

Why do we have failures and successes as separate metrics?

http_req_duration and other httpreq* metrics.

Don't emit http Trend metrics for failed requests

Tag http_req_duration with failed:true|false tag and display filtered values.

Cloud support

Performance insights

(alternative) Why don't we skip the new metrics and purely rely on failed tag?

Defining failure

per-request handling

Alternative 1 - handle inside the hook

Alternative 2 - override the hook per request

What about the redirect chains?

Concerns

1321 will push this past v0.32.0, probably even past v0.33.0 unless #1321 turns a lot easier than I expect. And I also don't think we have any consensus on how #1321 should be done.

1832 seems unrelated to the changes at hand to me, although is important to be done at some point as it is needed by other PRs working with metrics

Discussion about the metric type for `http_reqs_failure`

Tag `http_req_duration` with `failed:true|false` tag and display filtered values.

(alternative) Why don't we skip the new metrics and purely rely on `failed` tag?