influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
28.95k stars 3.56k forks source link

Add a moving_max() function #20253

Open highfestiva opened 3 years ago

highfestiva commented 3 years ago

Proposal:

Implement stable max-times, which are useful for highlighting spikes, e.g. moving_max().

Current behavior:

Without this feature, there are currently three options are available:

  1. moving_average() (or kaufmans_adaptive_moving_average()) which flatten out your spikes, essentially hiding them.
  2. Display without filtering. Visually representing spikes with a lot of jitter is suboptimal (for more reasons that I care to explain here).
  3. Increase the time period in the GUI. This throws out information, and makes the graph less appealing.

None of the above are satisfactory.

Desired behavior:

SELECT moving_max(..., 10) FROM ... would result in the maximum values including the previous 9 datapoints per each datapoint. This is calculated similarly to a moving_average(), except taking the maximum instead of the mean in the moving window.

Pseudo code:

This is what you'd write in Python/Pandas for a moving window size of 3:

pd.Series([1, 2, 3, 1, 1, 2, 4, 1, 0, 1]).rolling(3).max()
[nan, nan, 3.0, 3.0, 3.0, 2.0, 4.0, 4.0, 4.0, 1.0]

Use case:

Many people would use this to display maximum response times in http and db calls for example, as they frequently vary depending on input parameters. A frequently used function will normally have a bunch of cheap uses, and a few costly ones. This function would really help visualization of the costly ones, while more or less filtering out the cheap ones (depending on the moving window size).

BastiJoe commented 1 year ago

i like