Summaries at chunks (optimization suggestion)

Hello highly-appreciated TimescaleDB devs,

I was wondering if you ever thought about storing summaries about the containing values at chunk-level. For example, min, max, count, sum, and avg for the value columns per logical time series (identified by some time series key, e.g. weather_station in the example below).

This would immensely speed up aggregation queries like SELECT MIN(temperature) FROM weather_readings WHERE weather_station = 'MyLocalWeatherstation' AND time >= ... AND time < ... over large ranges of time.

I would reckon that these statistics can be calculated with minimal overhead when compressing chunks.

However, I cannot estimate how complicated it would be to make the query planner aware of them and handle all the edge cases. So in the end, it's just an idea.

For reference, IoTDB is doing something similar (they are calling it "Summary Info" [1]). EDIT: For further reference: BTrDB [2] and Timon [3] directly store (BTrDB) or reference (Timon) materialized aggregations at their tree indexes.

Best regards, Max

[1] https://www.vldb.org/pvldb/vol13/p2901-wang.pdf [2] https://www.usenix.org/system/files/conference/fast16/fast16-papers-andersen.pdf [3] https://dl.acm.org/doi/pdf/10.1145/3318464.3386136

timescale / timescaledb

Summaries at chunks (optimization suggestion) #3292