netdata / netdata

Architected for speed. Automated for easy. Monitoring and troubleshooting, transformed!
https://www.netdata.cloud
GNU General Public License v3.0
70.34k stars 5.84k forks source link

[Feat]: "Trimmed Mean" aggregation function #13426

Closed andrewm4894 closed 2 years ago

andrewm4894 commented 2 years ago

Problem

A "trimmed mean" could be a useful aggregation function in netdata in addition to a regular mean.

Description

Importance

nice to have

Value proposition

  1. more flexible aggration options

Proposed implementation

Same was mean is implemented, just implement some obvious ones like

andrewm4894 commented 2 years ago

In addition to median it could also be useful to add p5,p10,p90,p95 percentiles as aggregation functions too.

ktsaou commented 2 years ago

Using the same schematics as the countif function, we could provide 2 aggregation methods:

  1. trimmed-mean supporting a parameter to define the percentage, by default 5%.
  2. percentile supporting parameter to define the percentage, by default 95%.
ktsaou commented 2 years ago

I also think that when these aggregation methods are requested, we should always use tier0, when available.

ktsaou commented 2 years ago

@andrewm4894 according to these definitions, the truncated mean excludes blindly the bottom and top X points from the sorted series.

But there is another, probably more useful approach. To calculate and min and the max of the series, find the delta = X% of (max - min) and then average all the values that are between min + delta and max - delta. This way we don't exclude a specific amount of points, but anything that is not between X% above the min and X% below the max.

Using the above, we can also have a trimmed-mean and a trimmed-median.

The same goes for the percentile. We can either exclude points or values.

What do you think is the right thing? Exclude specific number of points or values?

andrewm4894 commented 2 years ago

Yep - i think a more flexible approach based on the countif idea could be useful. We had thought that maybe something like that could also be useful for different ways to aggregate and summarize the anomaly rate potentially too (eg number of times it crosses over some threshold, or counting "runs" of consequtive anomaly bits - all gets a bit convoluted though so we shelved it).

I think percentiles maybe would be the most useful and user friendly thing for now. So in the list below i could have

image

Maybe that gets a bit busy and we needs some other UI/UX way but i tihnk as a start if could be fine.

We would just need to make sure and add tooltips to the aggregation methods for anything more custom.

What do you think is the right thing? Exclude specific number of points or values?

I think it would be to exclude values as i think thats how people would more naturally think about it and expect it to behave.

Interquartile range could also be another useful one (and maybe just built easy off p75-p25): https://en.wikipedia.org/wiki/Interquartile_range. Whats the range of the middle 50%. Perhaps this can also be generalized to any ntile ranges if we wanted - eg "p25-p75 range", "p10-p90 range" etc.

andrewm4894 commented 2 years ago

Also think if this can all be dont in a way to make these aggragations available to the health engine that could be also very useful.

e.g. set an alarm based on the p90 value of a metric etc.

ktsaou commented 2 years ago

@andrewm4894 I implemented trimmed-median, trimmed-mean and percentile.

For the trimmed versions, I have provided aliases for quickly setting 1, 2, 3, 5, 10, 15, 20 and 25%. For the percentile, the aliases as 25, 50, 75, 80, 90, 95, 97, 98, 99.

In all versions any percentage may be specified with the group_options query parameter. All aliased versions can be used in health alerts (currently health alerts do not provide any means for setting query_options - once we do this, any percentage may also be specified in health alerts too).

andrewm4894 commented 1 year ago

@hugovalente-pm i think we should explore what would need to be done to make these available in NC

e.g: https://london.my-netdata.io/api/v1/data?chart=httpcheck_Bangalore_Demo_Site.response_time&points=1&group=percentile90

eg making some of them available in here:

image

novykh commented 1 year ago

It should be only frontend work.

hugovalente-pm commented 1 year ago

I'll create a ticket for this, how would guys @andrewm4894 @novykh see this being made available? we currently just have the drop downlist with options we don't have a case to specify parameters. would you see as a start something like?

maybe leveraging what we have on the filter by labels with collapsible sections image

andrewm4894 commented 1 year ago

@hugovalente-pm makes sense - or we could just add a subset for now assuming that dropdown is scrollable.

Start with just:

Trimmed Mean Trimmed Median Percentile 99 Percentile 95 Percentile 75 Percentile 90 Percentile 25 Percentile 10 Percentile 5 Percentile 1

Unsure how ugly or not that might look (?)

novykh commented 1 year ago

We can just add another input if any of

Trimmed Mean
Trimmed Median
Percentile

is selected.

So the filter will look like: "each as <select>99th</select> <select>Percentile</select> every 2 seconds" (make it a sentence I mean)

ps. sub-dropdown are bad experience 😄

hugovalente-pm commented 1 year ago

that looks better! will raise a ticket and if it is not a big effort task we could prioritize it