timescale / timescaledb

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
https://www.timescale.com/
Other
17.63k stars 884 forks source link

Time-weighted averages, any time interval, computed on the fly #2536

Closed inselbuch closed 1 year ago

inselbuch commented 3 years ago

It would be valuable for TimescaleDB to generate time-weighted averages for any specified interval on the fly. A time-weighted average differs from an arithmetic average:

1) the value preceding the time interval in question must be included in the computation 2) any value in the interval is weighted by the amount of time it was true (i.e., until the next value arrived)

For example:

select time, twa(value) from myData where time > startTime and time < endTime and period = '+0:15:00'

with a range of 5 hours would generate 4 values per hour.

time-weighted average.xlsx

Kevin-Mc-Callister commented 3 years ago

+1

svenklemm commented 3 years ago

Not sure i understand your example correctly but you can achieve a time-weighted average with time_bucket_gapfill:

CREATE TABLE data AS SELECT * FROM (VALUES ('2000-01-01 09:59'::timestamp,15),('2000-01-01 10:01',14),('2000-01-01 10:04',15),('2000-01-01 10:08',10),('2000-01-01 10:00',15),('2000-01-01 10:59',15)) v(time,value);

SELECT
  time_bucket('15m', time),
  avg(value)
FROM (
  SELECT
    time_bucket_gapfill('1m', time, '2000-01-01 10:00:00', '2000-01-01 11:00:00') AS time,
    locf(avg(value)) AS value
  FROM
    data
  GROUP BY
    1) sub
WHERE
  time BETWEEN '2000-01-01 10:00:00'
  AND '2000-01-01 11:00:00'
GROUP BY
  1;

In the inner query you generate a data point for every minute carrying forward the last seen value, in the outer query you calculate 15 minute averages.

inselbuch commented 3 years ago

That is an arithmetic average of generated one-minute values. And it will give the correct result as long as the values are all aligned on one-minute boundaries. But what happens if the values have resolutions below one minute?

Also, that is quite cumbersome, is it not?

Sent from my iPad

On Oct 13, 2020, at 4:35 PM, Sven Klemm notifications@github.com wrote:

 Not sure i understand your example correctly but you can achieve a time-weighted average with time_bucket_gapfill:

CREATE TABLE data AS SELECT * FROM (VALUES ('2000-01-01 09:59'::timestamp,15),('2000-01-01 10:01',14),('2000-01-01 10:04',15),('2000-01-01 10:08',10),('2000-01-01 10:00',15),('2000-01-01 10:59',15)) v(time,value);

SELECT time_bucket('15m', time), avg(value) FROM ( SELECT time_bucket_gapfill('1m', time, '2000-01-01 10:00:00', '2000-01-01 11:00:00') AS time, locf(avg(value)) AS value FROM data GROUP BY 1) sub WHERE time BETWEEN '2000-01-01 10:00:00' AND '2000-01-01 11:00:00' GROUP BY 1; In the inner query you generate a data point for every minute carrying forward the last seen value, in the outer query you calculate 15 minute averages.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

svenklemm commented 3 years ago

There are different solutions possible e.g using window functions but unfortunately they dont get less cumbersome when accounting for corner cases

inselbuch commented 3 years ago

Consider the scenario where you have around a dozen values per day with timestamps significant to the second. You are looking to generate a daily time-weighted average for the past six months.

The algorithm for time-weighted averages would work with the 2,160 values.

To use the bucket-fill method, Timescale needs to produce over a fifteen million values (15,552,000).

And if we want to be accurate to the tenth of a second? Arithmetic average of equally spaced values demands generation of over ten times that number of values (150 million) while still only actually needing 2,160 values.

If you are working with synchrophasors that are recorded 60 times per second, close to a billion values 933,120,000.

Here is a link to Python source for the time-weighted average:

https://github.com/inselbuch/twa/blob/master/twa.py

vicentini commented 3 years ago

@inselbuch Have you been able to solve this problem somehow with continuous aggregates? I'm currently in the same situation, having to calculate a time-weighted average.

ateuber commented 3 years ago

Such a functionality in timescaledb would be very helpful, as in just writing twa() instead of avg(). Before using an inner query as a workaround, it is easier to code it in Python.

akuzm commented 2 years ago

Looks like it's implemented as time_weight function in the timescaledb_toolkit extension:

https://www.timescale.com/blog/what-time-weighted-averages-are-and-why-you-should-care/

mfreed commented 1 year ago

Should we close out this issue given time_weight in the toolkit @davidkohn88 ?

https://docs.timescale.com/timescaledb/latest/how-to-guides/hyperfunctions/time-weighted-averages/time-weighted-average/

davidkohn88 commented 1 year ago

Sorry yes!