Closed wz2b closed 1 year ago
Today, a related topic came up. Someone wanted to compute Confidence Interval, and the ways to do this in flux are a little complicated. This re-enforces the idea that if you could have a statistics(col) function that returns all the basic stats I outlined. I think you could follow that with a map() that turns this into C.I.
I am trying to compute confidence interval for mean(and maybe for ratio later). In order to calculate statistical confidence intervals there have been a simple formula: x(mean) - 1.96std_dev < x < x(mean) + 1.96std_dev
A little help came from the #flux community and thanks to Anais, we now have a function
data = (from(bucket: "telegraf/autogen")
|> range(start: v.timeRangeStart)
|> filter(fn: (r) => r._measurement == "SmartSensor_ABS_TR_Kaufman_axialAxisRmsVibration")
|> filter(fn: (r) => r.name == "Indexer-202")
|> duplicate(column: "_value", as: "pred")
|> holtWinters(n: 60, interval: 10s, column:"pred", withFit: true, seasonality : 8)
mean_val = (data
|> mean(column: "_value")
|> findRecord(fn: (key) => true, idx: 0))._value
//Calculate standard deviation from sample
stddev_val_initial = (data
|> stddev()
|> findRecord(fn: (key) => true, idx: 0))._value
stddev_val = math.abs(stddev_val_initial)*1.96
data
|> map(fn: (r) => ({r with
upper: mean_val + stddev_val,
lower: mean_val - stddev_val}))
|> window(
every: 30s,
period: 1m,
offset: -5m,
timeColumn: "_time",
startColumn: "_start",
stopColumn: "_stop",
createEmpty: false)
The problem is here I use InfluxDB OSS 1.8.9 and my Flux version I guess 0.65. But findRecord
is not supported on Flux 0.65.
Mainly I have hard times when I read all the documentation on Flux because we could not assign some basic statistics methos to a variable and could not gather them in the same equation.
Some of the features in the #flux that I need as in the following:
Hotelling T2 Metric is especially crucial for get a valuable insights for clusters. By this way #Flux will have to ability to cluster, compare them and the wonderful thing is I can do it in a stream way on chronograf without turn my face to batch Python airflow tasks.
And finally linear regression is the work-horce of all statistics. Including regression function on flux brings so much valuable insights on data and I can create wonderful products by just using it as a data engineer-statistician.
I create wonderful jobs by using TICK stack and I believe if statistics functions on flux were available, data science manner were changed.
This issue has had no recent activity and will be closed soon.
I am trying to compute
stddev(col) / mean(col)
as an aggregation function usingaggregateWindow()
. I'm finding it's not that easy to do - I basically have to fetch the data, window and aggregate over it twice, then join the data back together. It occurred to me that it might not be a bad idea to have some kind ofstatistics(col)
function that returns not just a single column but a few columns for the basic stats one might want: mean, max, min, stddev, mode, sum, and count - and return them all at once. These things could then be useful for when you need to later combine stats, including situations where you need to do a piecewise or rolling standard deviation. So I thought I would throw this out there as a feature request and see what people think.