Closed andrewm4894 closed 2 years ago
In addition to median it could also be useful to add p5,p10,p90,p95 percentiles as aggregation functions too.
Using the same schematics as the countif
function, we could provide 2 aggregation methods:
trimmed-mean
supporting a parameter to define the percentage, by default 5%.percentile
supporting parameter to define the percentage, by default 95%.I also think that when these aggregation methods are requested, we should always use tier0, when available.
@andrewm4894 according to these definitions, the truncated mean excludes blindly the bottom and top X points from the sorted series.
But there is another, probably more useful approach. To calculate and min and the max of the series, find the delta = X% of (max - min)
and then average all the values that are between min + delta
and max - delta
. This way we don't exclude a specific amount of points, but anything that is not between X% above the min and X% below the max.
Using the above, we can also have a trimmed-mean and a trimmed-median.
The same goes for the percentile. We can either exclude points or values.
What do you think is the right thing? Exclude specific number of points or values?
Yep - i think a more flexible approach based on the countif idea could be useful. We had thought that maybe something like that could also be useful for different ways to aggregate and summarize the anomaly rate potentially too (eg number of times it crosses over some threshold, or counting "runs" of consequtive anomaly bits - all gets a bit convoluted though so we shelved it).
I think percentiles maybe would be the most useful and user friendly thing for now. So in the list below i could have
Maybe that gets a bit busy and we needs some other UI/UX way but i tihnk as a start if could be fine.
We would just need to make sure and add tooltips to the aggregation methods for anything more custom.
What do you think is the right thing? Exclude specific number of points or values?
I think it would be to exclude values as i think thats how people would more naturally think about it and expect it to behave.
Interquartile range could also be another useful one (and maybe just built easy off p75-p25): https://en.wikipedia.org/wiki/Interquartile_range. Whats the range of the middle 50%. Perhaps this can also be generalized to any ntile ranges if we wanted - eg "p25-p75 range", "p10-p90 range" etc.
Also think if this can all be dont in a way to make these aggragations available to the health engine that could be also very useful.
e.g. set an alarm based on the p90 value of a metric etc.
@andrewm4894 I implemented trimmed-median
, trimmed-mean
and percentile
.
For the trimmed versions, I have provided aliases for quickly setting 1, 2, 3, 5, 10, 15, 20 and 25%. For the percentile, the aliases as 25, 50, 75, 80, 90, 95, 97, 98, 99.
In all versions any percentage may be specified with the group_options
query parameter.
All aliased versions can be used in health alerts (currently health alerts do not provide any means for setting query_options
- once we do this, any percentage may also be specified in health alerts too).
@hugovalente-pm i think we should explore what would need to be done to make these available in NC
eg making some of them available in here:
It should be only frontend work.
I'll create a ticket for this, how would guys @andrewm4894 @novykh see this being made available? we currently just have the drop downlist with options we don't have a case to specify parameters. would you see as a start something like?
maybe leveraging what we have on the filter by labels with collapsible sections
@hugovalente-pm makes sense - or we could just add a subset for now assuming that dropdown is scrollable.
Start with just:
Trimmed Mean Trimmed Median Percentile 99 Percentile 95 Percentile 75 Percentile 90 Percentile 25 Percentile 10 Percentile 5 Percentile 1
Unsure how ugly or not that might look (?)
We can just add another input if any of
Trimmed Mean
Trimmed Median
Percentile
is selected.
So the filter will look like: "each as <select>99th</select>
<select>Percentile</select>
every 2 seconds" (make it a sentence I mean)
ps. sub-dropdown are bad experience 😄
that looks better! will raise a ticket and if it is not a big effort task we could prioritize it
Problem
A "trimmed mean" could be a useful aggregation function in netdata in addition to a regular mean.
Description
Importance
nice to have
Value proposition
Proposed implementation
Same was
mean
is implemented, just implement some obvious ones like