timescale / timescaledb

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
https://www.timescale.com/
Other
17.63k stars 883 forks source link

Summaries at chunks (optimization suggestion) #3292

Open maxhertrampf opened 3 years ago

maxhertrampf commented 3 years ago

Hello highly-appreciated TimescaleDB devs,

I was wondering if you ever thought about storing summaries about the containing values at chunk-level. For example, min, max, count, sum, and avg for the value columns per logical time series (identified by some time series key, e.g. weather_station in the example below).

This would immensely speed up aggregation queries like SELECT MIN(temperature) FROM weather_readings WHERE weather_station = 'MyLocalWeatherstation' AND time >= ... AND time < ... over large ranges of time.

I would reckon that these statistics can be calculated with minimal overhead when compressing chunks.

However, I cannot estimate how complicated it would be to make the query planner aware of them and handle all the edge cases. So in the end, it's just an idea.

For reference, IoTDB is doing something similar (they are calling it "Summary Info" [1]). EDIT: For further reference: BTrDB [2] and Timon [3] directly store (BTrDB) or reference (Timon) materialized aggregations at their tree indexes.

Best regards, Max

[1] https://www.vldb.org/pvldb/vol13/p2901-wang.pdf [2] https://www.usenix.org/system/files/conference/fast16/fast16-papers-andersen.pdf [3] https://dl.acm.org/doi/pdf/10.1145/3318464.3386136

mkindahl commented 3 years ago

Hello @maxhertrampf !

Thank you for an excellent feature request. As you noted, similar approach has been used in other databases where you can store partial aggregates in the index and/or other storage associated with the table.