Add explicit tracking and ignoring of metrics and sub-metrics

na-- commented 4 years ago

k6 measures a lot of things, and it can and can generate quite a lot of data points for a variety of different metrics. Not everything we measure is important to everyone, and we want to measure even more things - https://github.com/loadimpact/k6/issues/877, https://github.com/loadimpact/k6/issues/880, https://github.com/loadimpact/k6/issues/888, https://github.com/loadimpact/k6/issues/921, https://github.com/loadimpact/k6/issues/1311, and all of the non-HTTP protocols we want to add...

Unfortunately, currently there's no way for a user to tell k6 that they're not interested in some of the things it measures (https://github.com/loadimpact/k6/issues/570). That's a problem, because measuring and reporting useless things adds a significant overhead both to k6 (https://github.com/loadimpact/k6/issues/764), and to any custom outputs that users pipe the metrics data to (https://github.com/loadimpact/k6/issues/1060).

On top of that, some of the current default metrics just add clutter and confusion to the end-of-test summary report (https://github.com/loadimpact/k6/issues/1319), without a lot of benefit to most users. All of the http_req_* "minor" metrics probably shouldn't be enabled by default, instead we should probably have only http_req_duration, which is sending + waiting + receiving. Maybe we should also expose the composite metric for the connection establishment, but I doubt most people would care about http_req_tls_handshaking individually, most of the time. group_duration is also somewhat useless when you have a variety of different groups and don't use an external metrics output... Unfortunately, we currently can't remove any of these metrics by default, since we don't have a mechanism for users to enable them if they want/need them...

There's also no way besides thresholds for users to force k6 to calculate and display information about sub-metrics, i.e. specific subsets of the data points of the main metrics. This point was recently raised by a user in the community forum, and unfortunately, thresholds is the only way to tell k6 to keep track of sub-metrics right now :disappointed:

So, I think I've listed all of the connected issues I can remember... My proposed solution, as suggested by the issue title, is to allow users to whitelist metrics and sub-metrics. k6 should have a default whitelist, but we should allow users to set their own. Here is a suggestion for the simplest possible approach we can offer, though I think this might not be the best solution:

export let options = {
    // ...
    metrics: [
        "iterations",
        "iteration_duration",
        "http_req_duration",
        "http_req_duration{status:200}",
        "http_req_duration{staticAsset:yes}",
        "data_sent",
        "data_received",
    ],
};

This would work... somewhat..., but it has some big drawbacks:

If users just want to add a single extra metric to be displayed in addition to the ones we show by default, they'd have to list all of them here + their wanted one. This is as far from good UX as you can get...
Worse, if we ever add new metrics in newer k6 version, say http_req_errors (https://github.com/loadimpact/k6/issues/1311), most users that had already configured metrics themselves won't be aware of them, since they would be excluded, even if we show them by default in new k6 versions - that would be overwritten...
They also have to add all of their custom metrics, which can get tedious. Worse, if they don't specify this metrics option, our default metrics handling somehow has to make an exception for them and track and show the custom metrics' values anyway... After all, if a user went to the trouble of measuring something manually with a custom metrics, they probably care about it and want to see it in the results...

That said, I'm not firmly against still have such an option - it allows users explicitly and exactly to specify the metrics they want. But, if we do have it, I'd like us to heavily discourage its usage, and offer a better alternative for 99% of use cases.

I'd argue that the vast majority of users and use cases would be satisfied if we just had a simple way for adding metrics to the list of default metrics. So, I think we should add a simple new trackMetrics() function (I'd appreciate if someone can think of a better name..) to the k6/metrics package, which can be used like this:

import { trackMetric } from "k6/metrics";

trackMetrics("http_req_connecting", "http_req_tls_handshaking");
trackMetrics("http_req_duration{status:200}", "http_req_duration{staticAsset:yes}", /* ... */ );

Some details:

This won't overwrite the default metrics that we'd ship with k6, but will non-destructively add to them
With the help of the spread operator or apply(), you can easily pass an array as well.
For completeness' sake, we should probably also add an untrackMetric(name) function, just so that users aren't forced to use the metrics option directly.
Each custom metric constructor should, by default, implicitly call trackMetric(). This way, all custom metrics would be measured and shown in the test results. Not sure if we need to add a parameter to these constructors to disable this behavior, or if users can just call untrackMetric().
Each threshold should also implicitly call trackMetric() for the metric or sub-metric it was based on.
Outputs should probably also be able to specify certain metrics that they need, which would also implicitly be added to the default list if that output is enabled. For example, the group_duration metric doesn't make a lot of sense if you just run k6 locally, but it's very useful for the cloud output. That said, this addition feels optional, not a requirement, so I think it should be overrideable by untrackMetric("group_duration").

Implementation-wise, all of these function calls would probably matter only in the goja runtime that's executing the init context to get the exported options, but I'm not sure this is something worth optimizing.

Alternatively, we may be able to go off the trodden path and side-step the current k6 configuration entirely. I'm not sure if this would make the configuration mess (https://github.com/loadimpact/k6/issues/883) worse, or if it won't affect it at all, but I'm hopeful for the second... In any case, we can probably mostly handle the metrics whitelist as a localized per-VU thing that doesn't touch the global configuration. So, no global metrics key in the exported options, just some local per-VU logic that decides which metrics it's going to emit.

The biggest benefit of the alternative approach I can think of is that we'd get a ton of flexibility. Different VUs will be able to emit different metrics. Even better, a VU could temporarily stop emitting certain metrics, and later resume that emission, which has been requested before: https://github.com/loadimpact/k6/issues/1298, https://github.com/loadimpact/k6/issues/884#issuecomment-516899010 , https://github.com/loadimpact/k6/issues/891#issuecomment-453919783. The biggest issues with this approach I can think of are:

How would sub-metrics tracking and un-tracking work? I think we'd need significant refactoring of the way we track sub-metrics with this approach... Though that might not be a bad thing (https://github.com/loadimpact/k6/issues/572)
We would have no way of knowing which metrics are tracked globally, which might be somewhat problematic in the cloud execution. I don't think this is a huge issue, but I might be wrong. And there might also be a hybrid approach, where we have a global metrics option, but VUs can control metric emission locally. :man_shrugging:

Open questions:

Which approach is better? I think the second non-global approach at least deserves an investigation. Or is a hybrid approach also possible?
What hidden implementation challenges does this involve? I don't want to mess up the k6 config any more than it already is (https://github.com/loadimpact/k6/issues/883)...
Is this the best name or format for the option?
How we should handle the partial aggregation of HTTP metrics we send to the cloud, if we don't measure some of them by default anymore.
Should we invest in developing a more powerful rule definition for sub-metrics? And if so, when? I explained the problems and limitations of the current approach in https://github.com/loadimpact/k6/issues/1313, and despite what I mentioned in that issue, improving these rules can be done separately from improvements of the threshold parsing (https://github.com/loadimpact/k6/issues/961) - they're not actually connected. That said, I think these improvements should probably be done in a separate PR in the future, but we should be conscious that the current way of defining sub-metrics has to be improved and plan accordingly...

markjmeier commented 4 years ago

I think the trackMetric with an untrackMetric functionality makes the most sense to me. To me, it should mean:

untrackMetric() = Throw away, don't save the metric. We need to be careful with submetrics though. The total value should be a part of the main metric. e.g. if blocked has a value and we untrack it, the value of blocked is still a part of the total response time and should be included in that value. You just lose visibility to the granular value.
trackMetric() = I specifically care about this metric and want it highlighted at the end of test summary and/or in the LI cloud(analysis tab). I'm tracking it, therefore it's more important to me and should be highlighted (custom metrics are currently added to the small charts in analysis tab automatically).

In short, use untrack to remove defaults from view, and track to add more things needing to be tracked.

Reading the original question on the community board, would a trend custom metric work here? That ~emits at the end of the test~ shows in terminal as part of the end of test summary.

na-- commented 4 years ago

I think the trackMetric with an untrackMetric functionality makes the most sense to me. To me, it should mean:

untrackMetric() = Throw away, don't save the metric. We need to be careful with submetrics though. The total value should be a part of the main metric. e.g. if blocked has a value and we untrack it, the value of blocked is still a part of the total response time and should be included in that value. You just lose visibility to the granular value.

Yeah, sub-metrics would require some more thought... For example, if you untrack the "parent" metric, but explicitly track a few submetrics, should that work? My instinctive answer is "no", but I can see pros and cons for both approaches, so we need to consider that a bit more carefully.

And maybe we should have separate functions that deal with submetrics? This would also make implementing the rule matching enhancements from https://github.com/loadimpact/k6/issues/1313 easier... Or, better yet, expose a proper object with methods to query, track and untrack things?

trackMetric() = I specifically care about this metric and want it highlighted at the end of test summary and/or in the LI cloud(analysis tab). I'm tracking it, therefore it's more important to me and should be highlighted (custom metrics are currently added to the small charts in analysis tab automatically).

This makes sense, though it would probably greatly complicate the implementation, both in k6 and in the cloud. We'd need some mechanism to communicate these changes to the cloud, some sort of a meta-metric perhaps? And if we implement the metric tracking/untracking as a local per-VU thing, so that you can untrack some metrics midway through the VU execution, it would be difficult to square that with what we display in the test run view...

Reading the original question on the community board, would a trend custom metric work here? That emits at the end of the test. Maybe I'm missing something.

You're right, this is a valid workaround, I posted it in the forum. Not sure what you mean by "That emits at the end of the test." though :confused:

mstoykov commented 4 years ago

I don't think the "flexible" approach is actually what we want(or at least I see no real benefits to it):

What is the point of you being able to say:
```
res = http.get("something");
If (rest.status == 404) {
trackMetric('http_req_duration{"url": "'+res.url+'"}');
}
```
I mean I guess it is powerful but do you think that will work - it won't (or it is going to miss data) and if you can't do that having it as an option localizes it one place instead of giving it the ability to be hard to find in big code bases.
In the above example ... if I did have an error and the metric I defined with incorrect syntax (http_req_duration{"url": "something}).. what should k6 do - throw a warning or stop the execution? 2.1. I guess if you define a (sub-)metric that seems to be with correct syntax, but never gets any data you will want some warning? This really doesn't change with the "flexible" approach but is still an interesting question. We definitely can check in both cases that a given metric is defined and abort the execution as we clearly won't be able to provide any data about the metric the user wanted.
Are we going to provide the information that metric "something" was actually requested for adding in the final output to any "output" ? I mean the cloud one will probably be interested that someone was interested in the http_req_duration{"tag": "statisAsset"}. And other outputs ... might be interested as well.

IMO this being another option(s) have a lot more benefits:

you can configure it in the config file and reuse it for all your projects
will be a lot more easy to find and won't add a bunch of new code and the problem that even if you can only use trackMetric in the initContext
```
import { trackMetric } from "k6/metrics";
```

// metric not defined up to here trackMetric("custom_metric{tag:something}");

Should `trackMetric` error out when it's called and no such metric is defined? Or will we allow the metric to be defined after we have said `trackMetric`, in which case we won't be able to tell the user (easily) where `trackMetric` was called with unregistered metric.

All of those are no problem in a single file 50-60 lines codebases, but we do have a lot of people with arguably whole frameworks around k6 in which this will add ... IMO unneeded complexity.
3. I propose we have 3 new options keys, instead 1 new option key and 2 functions:
```json
{
  "tracked_metrics": ["metric1","metric2"]
}

mean track only metrics: "metric1", "metric2" and nothing else with a good default value:

{
  "additionally_track_metrics": ["metric3","metric4"]
}

which means to the metrics that already are tracked ... track those as well. This will return an error if combined with the above "tracked_metrics",
and finally

{
  "do_not_track_metrics": ["metric5"]
}

can be combined with additionally_track_metrics (but not if they share the same key), but not tracked_metrics. This way most people will only use additionall_track_metrics and do_not_track_metrics. All of those will be in the same place (we might even add metric option key on top of those ...? Obviously all the names I proposed are up for discussion ... preferably with something less verbose :D

Implementation algorithm:

We parse the options check for syntax errors. 1.1 If errors report them and stop the execution
If no syntax errors: check that all the metrics(and sub metrics) have their base metric defined 2.1. if not we can print a very nice message telling which ones are not valid and what are the currently defined metrics with a small explanation about sub-metrics and possibly a link to the documentation.
Profit !!!

I agree that this adds complexity to the configuration .. but not really any more then it already is there and I think this complexity is a lot better than adding 1 key and then 2 functions that we need to take care of on top of. This also can be easily expected in the cloud output at least as all the options are sent either way.

I also think that this should be considered with the future metric aggregation in mind.

Also, I would like to point out that the cloud output might need 1 or more of the metrics either way in order to function properly and I am against the user being able to break the cloud output when they use it and as such we probably will need a mechanism to tell them that they can't disable "http_req_duration" for example. Also, this probably should prevent people from disabling metrics that are used in thresholds?

imiric commented 4 years ago

I agree with the drawbacks of proposal #1, those seem like deal breakers.

The second functional approach is interesting, but it will only complicate scripts and our implementation if we decide to support untrackMetric(), and it won't give the flexibility of changing default tracked metrics (though I suppose those could be untracked if needed).

AFAICS, this is a global runtime setting, and as such it makes more sense to me for it to be configured via a CLI flag / env var. For this to work simply, we probably need to structure the metric names better and for the config option to support wildcards or regex.

Consider:

k6 run --metrics 'iter*,http_req/*,data*' ...

I think this makes intuitive sense as to what is being tracked. Note that we could structure child metrics with / to make this point clearer.

I don't think adding the functionality of untracking metrics mid-test would be worth the added complexity.

na-- commented 4 years ago

@mstoykov,

I don't think the "flexible" approach is actually what we want(or at least I see no real benefits to it):

What is the point of you being able to say:
res = http.get("something");
If (rest.status == 404) {
  trackMetric('http_req_duration{"url": "'+res.url+'"}');
}
I mean I guess it is powerful but do you think that will work - it won't (or it is going to miss data) and if you can't do that having it as an option localizes it one place instead of giving it the ability to be hard to find in big code bases.

Hmm seems like I didn't explain my "localized tracking and untracking" idea properly... The above example doesn't make sense, because http.get() would have already sent the metrics it generated. Something like this is what I envision if we have localized tracking:

untrackMetric("http_req_duration");

// A region of code that makes a bunch of requests you don't want
// to track. There are many reasons users may want that - some
// setup() code that's not important, or requests that they know
// are slow and don't want to pollute the rest of the metrics, or
// expected errors, etc.
http.get("https://myapi.com/some-super-slow-call-I-don't-care-about")
// ...

trackMetric("http_req_duration");

In the above example ... if I did have an error and the metric I defined with incorrect syntax (http_req_duration{"url": "something}).. what should k6 do - throw a warning or stop the execution?

As I mentioned above, we probably should have separate methods for submetrics, so we don't rely quite so much on arcane syntax. In any case, I think we should just do the normal thing in such a situation and throw an exception :wink:.

2.1. I guess if you define a (sub-)metric that seems to be with correct syntax, but never gets any data you will want some warning? This really doesn't change with the "flexible" approach but is still an interesting question. We definitely can check in both cases that a given metric is defined and abort the execution as we clearly won't be able to provide any data about the metric the user wanted.

I'm not sure aborting the execution makes sense - after all, a metric can receive data points even at the very end of the execution, in the teardown() :wink:.

And while in a lot of cases it would be a good UX to emit a warning if a metric was (explicitly or implicitly) tracked, but didn't get any data, there are cases where it doesn't make sense... For example, say that you have an error Counter, but it only receives values when there's an error. We obviously shouldn't show an error/warning if the error counter didn't receive any metrics... :smile:

So, I think the reasonable approach here is that, if the user implicitly or explicitly tracked a metric, we should show it in the end-of-test summary, even if it's to show 0. Pending any explicit template manipulations, of course (https://github.com/loadimpact/k6/issues/1319).

Are we going to provide the information that metric "something" was actually requested for adding in the final output to any "output" ? I mean the cloud one will probably be interested that someone was interested in the http_req_duration{"tag": "statisAsset"}. And other outputs ... might be interested as well.

I think yes... eventually. It would be better UX, but it seems like we can implement it as a separate PR, after we implement the rest. That's what I meant by "This makes sense, though it would probably greatly complicate the implementation, both in k6 and in the cloud." above...

IMO this being another option(s) have a lot more benefits:

you can configure it in the config file and reuse it for all your projects

If you mean "configure an array and reuse that", this has all of the drawbacks I mentioned above under "but it has some big drawbacks"... Even if we use a proper option to configure this, I'd still like to heavily discourage this approach.

And if you mean trackMetrics()/untrackMetrics(), then users can reuse their configuration,regardless of which aproach we choose - a proper global option or a local per-VU thing. Here's how it can work in both approaches:

import { trackMetrics } from "k6/metrics";
import { myImportantMetricsArray } from "./some-common-file.js"

trackMetrics(...myImportantMetricsArray);

The only difference is that with the local per-VU approach, you have the added flexibilty.

will be a lot more easy to find and won't add a bunch of new code and the problem that even if you can only use trackMetric in the initContext

I don't understand what you mean here, sorry.

import { trackMetric } from "k6/metrics";

// metric not defined up to here
trackMetric("custom_metric{tag:something}");
Should trackMetric error out when it's called and no such metric is defined? Or will we allow the metric to be defined after we have said trackMetric, in which case we won't be able to tell the user (easily) where trackMetric was called with unregistered metric.

trackMetrics is in a sense "registering" this metric... The way metrics work in k6, calling a custom metric constructor at the moment doesn't do or register anything globally... So, I think this should change, and trackMetrics (and custom metrics constructors that would implicitly call it) would actually create a record somewhere. As I said, I think a nice balance should be that, if you explicitly or implicitly track a metric, it will be displayed in the end-of-test summary (and in the cloud, eventually).

All of those are no problem in a single file 50-60 lines codebases, but we do have a lot of people with arguably whole frameworks around k6 in which this will add ... IMO unneeded complexity.

I propose we have 3 new options keys, instead 1 new option key and 2 functions:
{
  "tracked_metrics": ["metric1","metric2"]
}
mean track only metrics: "metric1", "metric2" and nothing else with a good default value:
{
  "additionally_track_metrics": ["metric3","metric4"]
}
which means to the metrics that already are tracked ... track those as well. This will return an error if combined with the above "tracked_metrics", and finally
{
  "do_not_track_metrics": ["metric5"]
}
can be combined with additionally_track_metrics (but not if they share the same key), but not tracked_metrics. This way most people will only use additionall_track_metrics and do_not_track_metrics. All of those will be in the same place (we might even add metric option key on top of those ...? Obviously all the names I proposed are up for discussion ... preferably with something less verbose :D

Implementation algorithm:
1. We parse the `options` check for syntax errors.
   1.1 If errors report them and stop the execution

2. If no syntax errors: check that all the metrics(and sub metrics) have their base metric defined
   2.1. if not we can print a very nice message telling which ones are not valid and what are the currently defined metrics with a small explanation about sub-metrics and possibly a link to the documentation.

3. Profit !!!
I agree that this adds complexity to the configuration .. but not really any more then it already is there and I think this complexity is a lot better than adding 1 key and then 2 functions that we need to take care of on top of. This also can be easily expected in the cloud output at least as all the options are sent either way.

I very much dislike this approach... :sweat_smile: To me it seems to be more complicated for users, more complicated in terms of https://github.com/loadimpact/k6/issues/883, and less flexible than either of the approaches that used functions. And I don't exaggerate the user-facing complexity, it literally took me 3 readings to understand what you're proposing, and I'm still probably misunderstanding something. For example:

This will return an error if combined with the above "tracked_metrics"

Wat, why?

I also think that this should be considered with the future metric aggregation in mind.

Agree, though I don't currently see a lot of overlap between these things. That is, I don't see how any metric whitelist/blacklist option, or even its lack, would affect the aggregation. The most relevant thing I can think of is the aforementioned meta-metric that could be used to notify outputs of the existence of metrics/sub-metrics. This probably could be reused for aggregation "announcements"/options...

Also, I would like to point out that the cloud output might need 1 or more of the metrics either way in order to function properly and I am against the user being able to break the cloud output when they use it and as such we probably will need a mechanism to tell them that they can't disable "http_req_duration" for example.

Please refer to the "Outputs should probably also be able to specify certain metrics that they need" part from my initial proposal..

Also, this probably should prevent people from disabling metrics that are used in thresholds?

Ditto, "Each threshold should also implicitly call trackMetric()..." in the original post/

@imiric,

The second functional approach is interesting, but it will only complicate scripts and our implementation if we decide to support untrackMetric(), and it won't give the flexibility of changing default tracked metrics (though I suppose those could be untracked if needed).

Whether we're using a global option or a local per-VU metric tracking, the complexity, performance, and logic of the code wouldn't be substantially changed. Only, in one case you'd execute that logic somewhat centrally, and in the other case you'd do it independently in each VU. Thinking about it some more, the local option might even be a bit more performant...

Also, we're not doing something wildly dissimilar even now. Refer to the stats.PushIfNotDone(ctx, state.Samples, ...) calls sprinkled in the k6 code. The ctx and state argument in these invocations are VU-level variables...

AFAICS, this is a global runtime setting, and as such it makes more sense to me for it to be configured via a CLI flag / env var.

I'm not against having a global CLI flag / env var., but this has the same drawbacks of my first global metrics option proposal.

For this to work simply, we probably need to structure the metric names better and for the config option to support wildcards or regex.

Consider:
k6 run --metrics 'iter*,http_req/*,data*' ...
I think this makes intuitive sense as to what is being tracked. Note that we could structure child metrics with / to make this point clearer.

I'd have very much liked if we could do something like this... :disappointed: Unfortunately, because metrics in k6 scripts can originate organically in the middle of the script, we can't know all of their names in advance. "Knowing the metric names in advance" is half of the idea for having an explicit whitelist...

So, since we can't know all possible metric names before the script starts, this approach would be very costly in terms of CPU usage, because we'd have to save all of these wildcard patterns without resolving them. Then, whenever a metric is emitted, we'd have to validate its name against them. Even if we used tries, it seems to me like this would be too CPU heavy when you have to do it potentially a several million times per test run :disappointed: The alternative of having a simple whitelist/blacklist would just be a fast lookup in a small map.

I don't think adding the functionality of untracking metrics mid-test would be worth the added complexity.

I'm more and more of the opposite opinion... :sweat_smile: The thing is that the more I consider the alternatives, the more appealing the local per-VU metric tracking and untracking becomes... Or, at the most, the hybrid approach.

My reasons are twofold:

Implementing it is not going to be complicated. My only concern is how we would deal with sub-metrics, and most of that is not the actual implementation, but what would be the most consistent and logical interface we can present to users...
We would eventually want to add functionality to suppress metric emission anyway... There are way too many valid use cases for something like it. So, the question is, should we add some other, completely different, mechanism to implement something like that (https://github.com/loadimpact/k6/issues/884#issuecomment-516899010 for example, or even @sniku's ideas in https://github.com/loadimpact/k6/issues/1311), or should we have a single consistent, flexible and logical interface for dealing with all "metrics emission" use cases? To me, the latter is the obviously preferable choice, both in terms of implementation complexity for us (way less corner cases), and in terms of UX (users only need to understand one relatively simple thing).

gygitlab commented 4 years ago

Hey. I was the one who asked the question over on Community.

Thanks for raising this and driving it forward. To explain a little more where this is coming from the problem we were trying to solve was to succinctly show more detail about different endpoints being called in the same tests so we can see which ones are slower, etc... The tag + threshold functionality does this nicely for us without having to add in a lot of custom metrics for each endpoint (and then multiple thereof for each data point we wanted to track).

All of the httpreq* "minor" metrics probably shouldn't be enabled by default, instead we should probably have only http_req_duration, which is sending + waiting + receiving

Also agree with this except that we're also using http_req_waiting to monitor time to first byte so it would be good to have this kept or a TTFB metric added instead.

Many thanks again for driving this. Really happy with k6 and excited to see where it goes next.

na-- commented 4 years ago

Thanks for chiming in!

To explain a little more where this is coming from the problem we were trying to solve was to succinctly show more detail about different endpoints being called in the same tests so we can see which ones are slower, etc... The tag + threshold functionality does this nicely for us without having to add in a lot of custom metrics for each endpoint (and then multiple thereof for each data point we wanted to track).

Do you think the approach of enumerating these endpoints and calling something like trackMetric("http_req_ttfb{<url-or-some-endpoint-tag>:<endpoint-url-or-id>}"); for each endpoint would work well for you? Or do you have any better suggestions?

All of the httpreq* "minor" metrics probably shouldn't be enabled by default, instead we should probably have only http_req_duration, which is sending + waiting + receiving

Also agree with this except that we're also using http_req_waiting to monitor time to first byte so it would be good to have this kept or a TTFB metric added instead.

TTFB definitely seems like a more useful metric than most of the ones we show right now... We currently measure and display the fundamental parts of the HTTP request making process, connecting, tls_handshaking, sending, waiting and receiving, in that order. We also show blocked, which is somewhat separate, is often misinterpreted, and isn't very useful right now... though it might become more so in the future (https://github.com/loadimpact/k6/issues/1169).

The only composite metric we currently show is http_req_duration, which is sending + waiting + receiving. This is useful precisely because it ignores all connection establishment times, making it very comparable across all requests in a load test. If you haven't disabled the default k6 connection reuse, the initial requests would have some non-zero connecting and tls_handshaking times, while the subsequent requests would have these metrics be 0. So, if you compare the total times of these requests, you'd be comparing apples to oranges, and that's one of the reasons http_req_duration exists - you could compare its values across all requests in your load test.

That said, it's obvious that network delays shouldn't be ignored. So I'd be perfectly fine with adding a new composite metric we show by default called http_req_ttfb. It should probably be connecting + tls_handshaking + sending + waiting, since I think that's the most common TTFB interpretation, right? This would still be very useful, just with the understanding that connection reuse would likely heavily affect it.

But I think we, by default, should display the "minor" constituent metrics only if users explicitly request it, or have thresholds based on them. I think that, in most use cases, you'd only want to look at these if you're debugging an issue? And it'd be much easier to initially spot an issue with the composite metrics, since they can surface them much more easily and visibly, I think... And if http_req_waiting is precisely what you want and need anyway, having a single trackMetric("http_req_waiting") in the script hopefully isn't a huge burden :sweat_smile:

na-- commented 4 years ago

We can very easily add such a helper function as a simple and fool-proof way to track/untrack metrics per-VU. This way there wouldn't be any issues with surviving exceptions and code bugs, would be to have a helper API like this:

withUntrackedMetrics(["http_req_duration{status:403}", "http_req_ttfb"], function() {
   // some user code that doesn't want to measure the metrics above, 
   // but may throw an exception without messing the state
})

Basically, this is like group(), but for tracking/untracking metrics. I'd argue that if group isn't complicated, this isn't complicated either :wink:

gygitlab commented 4 years ago

Do you think the approach of enumerating these endpoints and calling something like trackMetric("http_req_ttfb{:}"); for each endpoint would work well for you? Or do you have any better suggestions?

Hmm that seems fine to me at first glance yeah.

But I think we, by default, should display the "minor" constituent metrics only if users explicitly request it, or have thresholds based on them. I think that, in most use cases, you'd only want to look at these if you're debugging an issue? And it'd be much easier to initially spot an issue with the composite metrics, since they can surface them much more easily and visibly, I think... And if http_req_waiting is precisely what you want and need anyway, having a single trackMetric("http_req_waiting") in the script hopefully isn't a huge burden 😅

Yeah I can agree on this. What I meaning though was you remove the minor metrics and leave http_req_duration but also bring in a new TTFB metric to go alongside it as I'd consider those the be the real metrics most folks will want to follow.

TamiTakamiya commented 4 years ago

I am also very interested in this feature primarily because the disk usage of our InfluxDB increases > 1 GB/hour when our test is running. Unless we pay extra attention, the disk usage could reach to the limit (we set it to 100 GB) any time, like at midnight or on weekends. I hope this feature will be implemented in the near future. Thanks.

While waiting, I am thinking of making a workaround for the InfluxDB disk usage issue, i.e. just suppress writing to InfluxDB for some metrics specified through an environment variable. When I could build such a workaround, I will push the code to my forked k6 repository.

TamiTakamiya commented 4 years ago

I could build a workaround described above: https://github.com/TamiTakamiya/k6/commit/566b0e666d604fdc253f96f9756f0cf6245d1ce4

This simply suppress saving metrics specified in the environment variable K6_INFLUXDB_METRICS_SUPPRESSED into InfluxDB. As metrics are still being collected internally and appear in the summary report, it won't be a long-term solution, but at least it could resolve our immediate issue.

na-- commented 4 years ago

I described a very partial workaround here, for anyone that doesn't want their setup() and teardown() metrics to pollute the end-of-test summary. Implementing this issue, or something like it, is still vitally important, so we fix root cause of the problem, but the per-scenario tags in k6 v0.27.0 offer some wiggle room in the meantime...

na-- commented 3 years ago

Another use case for giving users the ability to say "keep track of all submetrics based on some tags": https://community.k6.io/t/how-to-get-failure-statistics/1247

If users could specify that k6 should "keep track of all sets of http_reqs sub-metrics based on the error_code (or status + error) tags", then they'd be able to generate such an HTML report now that we have https://github.com/loadimpact/k6/pull/1768

na-- commented 3 years ago

Whatever API we settle on here, we should make sure that the way users specify metrics is future-proof in terms of the improvements we want to make in https://github.com/loadimpact/k6/issues/1443 / https://github.com/loadimpact/k6/issues/1441 / https://github.com/loadimpact/k6/issues/1313. And since we're likely to do this issue first, we should probably make sure that the code that parses the metrics and submetrics would be reusable for whenever we do the other issues.

More concretely, http_req_duration{staticAsset:yes} might be fine for now, but we should keep in mind that we'll have to be able to accept something like this eventually:

{
  "name": "http_req_duration",
  "filters": [
    {
    "left-hand": "staticAssets",
    "operator": "==", 
    "right-hand": "true", 
    "right-hand-type": "string"
    },
  ]
}

na-- commented 3 years ago

https://github.com/loadimpact/k6/issues/1832 (or something like it) is a prerequisite for the current issue.

emil45 commented 3 years ago

Hey guys,

Came here after this POST Would it be reasonable to ask if the "exclude" feature will be out there soon? (or later 😕 )

na-- commented 3 years ago

@emil45, this is high on our priority list. Unfortunately, there are several things that are even higher on the list, as well as some prerequisites we have to implement first (e.g. https://github.com/loadimpact/k6/issues/1832), so it's unlikely to land in the next k6 version (v0.32.0, slated to be released May 3rd). There are good chances it will land in the one after it though (v0.33.0, scheduled for the end of June), so I've changed the milestone to that.

mstoykov commented 3 years ago

@emil45 as I mentioned in https://github.com/loadimpact/k6/issues/1730#issuecomment-796869067 I've managed to use telegraf + namepass fields to drop some metrics as a workaround. (I expect that predominantly we want to exclude metric in order to safe storage in an external metric storage)

Depending on whether you prefer aggregation using the statsd input and the influxdv2 output seemed to work the best for me and it has percentile support which is also very useful IMO. Here is a gist with a sample telegraf config and a grafana dashboard + how it looks in a comment.

ALOKSINGHDAGAR commented 3 years ago

Hi,

I ran into a similar problem where a half an hour test ended up generating ~40 GB of result file, just checking if a solution is in place to filter out the unwanted metrics in result file ?

imiric commented 3 years ago

@ALOKSINGHDAGAR If your result file is in JSON, you can use jq to filter out unwanted metrics. See the examples in the documentation.

Unfortunately I don't think there's a way to filter them in real-time, which is why this issue is needed, but after the test you can use jq.

ALOKSINGHDAGAR commented 3 years ago

@imiric thanks, its a great tool, but if something can be rolled out to improve its reporting that would be great.

mstoykov commented 2 years ago

Another workaround is to filter the logs using a pipe and grep(or jq for that matter) as in:

mkfifo pipe 
cat < pipe | grep 'http_req_duration' >new.csv & // change filtering to what you want, you can also compress through gzip here as well
k6 run -o csv=pipe script.js
rm pipe // some cleanup

ppcano commented 2 years ago

I came to Exclude http requests made in the setup and teardown functions, so I wanted to know how some requests that interact with a third-party service could be ignored.

Given the k6 built-in metrics, the selected HTTP request/s should not affect the general data_* and http_req* metrics.

I read a few proposals. Subscribing to the issue to be informed about the final API proposal.

BjornPaulsson commented 1 year ago

Subscribed to the issue. We would need a solution for this as soon as possible where I work. Is there any time plan for this?

na-- commented 1 year ago

No firm plans yet, sorry :disappointed: Though the metric refactoring we did in the recently released k6 v0.41.0 should hopefully make implementing this feature significantly easier :crossed_fingers:

I think the biggest blocker right now is the fact that we don't have a good proposal for how the JS API should look like :thinking:

sebastianpujina commented 1 year ago

Subscribed to the issue. Any plans for this? 🤞

codebien commented 1 year ago

Hi @sebastianpujina, this feature is one of our high priorities in our backlog for the Metrics area. In the next cycle, we could consider starting to work on a JavaScript API for it.

Remember to upvote the issue using the thumbs-up reaction so it will be balanced appropriately during the evaluation phase.

Currently, there isn't any update to the @na--'s https://github.com/grafana/k6/issues/1321#issuecomment-1301958587.

grafana / k6

Add explicit tracking and ignoring of metrics and sub-metrics #1321