Open YANG-DB opened 1 month ago
Hi @YANG-DB , could you provide any background about distinguishing between trendline
and stats
? Based on the examples above, I don't get the key of difference between them.
Hi 'stats' command only supports standard average without the sliding average window, we could add that support to 'stats' command but I think that separating the sliding window average from standard stats will simply the actual usage. In addition such unique commands for trends exist in other prominent pipeline languages
My concern is there could be more window function related requests in PPL. I prefer to add a new command to support all of them instead of introducing specific smb
and wma
functions. My thoughts is adding a fundamental syntax for common propose which similar to streamstats
.
The example "compute a 5-point simple moving average of the 'cpu_usage' field and store it in a new field called 'smooth_cpu'." could be written to
| streamstats window=5 current=false global=false avg(cpu_usage) as smooth_cpu
or named trendstats
Hello, I've started the trendline PPL implementation. I've ported the lexer and parser changes over from the Spark PR.
Implementation-wise I think this should be very similar to how average is implemented, except the AggregationState should factor in the window specification. Does this seem like the right way to go?
One difference between this and average is that this is a root level command rather than a sub-command of stats
Also, noticed that the syntax suggested above differs from the SPL definition of trendline: https://docs.splunk.com/Documentation/SplunkCloud/latest/SearchReference/Trendline
Notably, in SPL, the period argument is sort of embedded into moving average type instead of being within the parentheses of the moving average type
@jduo I would like to rethink about this
streamstats
streamstats is another function we can implement in a more generic manner - but still I see value to the trendline
implementation in its own - @penghuo @ykmr1224 @dai-chen what do you think ?
Also, noticed that the syntax suggested above differs from the SPL definition of trendline: https://docs.splunk.com/Documentation/SplunkCloud/latest/SearchReference/Trendline
Notably, in SPL, the period argument is sort of embedded into moving average type instead of being within the parentheses of the moving average type
@jduo thanks
I'd like to reduce the complexity for the first iteration - it does resemble the stats
command - maybe we can think of extending stats
instead ?
let me know what you think and if you have any specific suggestion here ?
Also, noticed that the syntax suggested above differs from the SPL definition of trendline: https://docs.splunk.com/Documentation/SplunkCloud/latest/SearchReference/Trendline Notably, in SPL, the period argument is sort of embedded into moving average type instead of being within the parentheses of the moving average type
@jduo thanks I'd like to reduce the complexity for the first iteration - it does resemble the
stats
command - maybe we can think of extendingstats
instead ? let me know what you think and if you have any specific suggestion here ?
@YANG-DB I made some good progress on the window-function like version of trendline yesterday before I saw this comment. I have parsing and logical planning working right. I'm going to continue down this path today to see if I can get execution working. I can put up a PR if you'd like to see or wait until execution works.
@jduo thanks for the update Please continue and create the PR in draft mode
Should the alias part of the trendline command be mandatory? As a user I would expect its optional and to use the original field name if omitted. In SPL it is optional.
Is your feature request related to a problem?
Adding a new PPL
trendline
command to support computing a moving averages of fields.We would like to support two flavours of moving average:
**SMA : Simple moving average**
SMA(t) = (1/n) * Σ(f[i]), where i = t-n+1 to t
**WMA : Weighted moving average**
WMA(t) = Σ(w[i] * f[i]) / Σ(w[i]), where i = t-n+1 to t
Where w[i] is the weight for the i-th data-point.In a typical WMA, the weights are linearly decreasing from the most recent to the oldest data-point:
w[i] = n - (t - i), where i = t-n+1 to t
The complete forumlation would be:
WMA(t) = Σ((n - (t - i)) * f[i]) / Σ(n - (t - i)), where i = t-n+1 to t
Example
The next command shows a trendline over a 5 month period events by month
The next command would compute a 5-point simple moving average of the 'cpu_usage' field and store it in a new field called 'smooth_cpu'.
Multiple trendlines could be calculated in a single command, such as
Support for PPL
trendline
functionality is required for both:- OpenSearch based PPL engine
https://github.com/opensearch-project/sql/issues/3013
- Spark based PPL engine
Do you have any additional context? Add any other context or screenshots about the feature request here.