opensearch-project / opensearch-spark

Spark Accelerator framework ; It enables secondary indices to remote data stores.
Apache License 2.0
22 stars 33 forks source link

[FEATURE]PPL new trendline command (SMA) #655

Closed YANG-DB closed 2 weeks ago

YANG-DB commented 2 months ago

Is your feature request related to a problem?

Adding a new PPL trendline command to support computing a moving averages of fields.

We would like to support two flavours of moving average:

**SMA : Simple moving average**

SMA(t) = (1/n) * Σ(f[i]), where i = t-n+1 to t


**WMA : Weighted moving average**

WMA(t) = Σ(w[i] * f[i]) / Σ(w[i]), where i = t-n+1 to t Where w[i] is the weight for the i-th data-point.

In a typical WMA, the weights are linearly decreasing from the most recent to the oldest data-point: w[i] = n - (t - i), where i = t-n+1 to t

The complete forumlation would be: WMA(t) = Σ((n - (t - i)) * f[i]) / Σ(n - (t - i)), where i = t-n+1 to t


Example

The next command shows a trendline over a 5 month period events by month

source=t | stats count(date_month) | trendline sma(5, count) AS trend | fields  trend

The next command would compute a 5-point simple moving average of the 'cpu_usage' field and store it in a new field called 'smooth_cpu'.

source=t| trendline sma(5,cpu_usage) as smooth_cpu

Multiple trendlines could be calculated in a single command, such as

| trendline sma(10,memory) as mem_trend wma(5,network_traffic) as net_trend.
salyh commented 2 weeks ago

This issues and associated PRs covers Simple moving average (SMA) only. For WMA (Weighted moving average) see issue https://github.com/opensearch-project/opensearch-spark/issues/831