influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
28.63k stars 3.54k forks source link

Mean of circular quantities #18388

Open wfjm opened 4 years ago

wfjm commented 4 years ago

Proposal: Add a function which calculates the proper mean of circular quantities.

Current behavior: InfluxDB provides a mean() function, good for linear quantities, but not the proper approach for circular quantities, like angles. The "mean" value of 350 degree and 20 degree should give about 10 degree, and obviously not 185 degree, as mean() would return.

Desired behavior: The proper way of calculating the mean of circular quantities is well known, see https://en.wikipedia.org/wiki/Mean_of_circular_quantities, and is in a nutshell

  atan2 ( mean (sin(x), mean (cos(x) )

A circular_mean() function would most likely have to more arguments

Use case: Currently it's cumbersome to treat angles in InfluxDB. Angles are quite common in technical measurements (e.g. orientation of a device). Even telegraf generates them out-of-the-box (the wind direction returned by the openweathermap input plugin).

russorat commented 4 years ago

@wfjm thanks for opening this. sounds like this would make a great Flux library.

aclerc commented 2 years ago

I work in wind energy and we're using influxdb. This is a key issue!

godric commented 2 years ago

So I think I figured out how to implement this as a custom aggregation function

import "math"

circularMean = (tables=<-, column) => 
    tables 
        |> reduce(
            identity: { count: 0.0, sumX: 0.0, sumY: 0.0, avg: 0.0 },
            fn: (r, accumulator) => {
                x = math.cos(x: r._value)
                y = math.sin(x: r._value)
                return {
                    count: accumulator.count + 1.0,
                    sumX: accumulator.sumX + x,
                    sumY: accumulator.sumY + y,
                    avg: math.atan2(
                        x: (accumulator.sumX + x) / (accumulator.count + 1.0),
                        y: (accumulator.sumY + y) / (accumulator.count + 1.0)
                    )
                }
            }
        )
        |> drop(columns: ["sumX", "sumY", "count"])        
        |> rename(columns: {avg: column})

Which can be then used as

mydata
  |> aggregateWindow(every: v.windowPeriod, fn: circularMean, createEmpty: false)

I'm not 100% sure about this implementation:

In any case I hope it can be of use to others, at least until a better implementation is added to the built-ins.

nathanielc commented 2 years ago

@godric That implementation looks good to me.

I couldn't figure how to use the column parameter for the input instead of hardcoding _value (which should work in most cases?)

Also this is currently not possible in Flux so make sense you couldn't figure it out. However we are working on an update to Flux's type system and syntax that would make this possible in the future.