Open maxhertrampf opened 3 years ago
Hello @maxhertrampf !
Thank you for an excellent feature request. As you noted, similar approach has been used in other databases where you can store partial aggregates in the index and/or other storage associated with the table.
Hello highly-appreciated TimescaleDB devs,
I was wondering if you ever thought about storing summaries about the containing values at chunk-level. For example, min, max, count, sum, and avg for the value columns per logical time series (identified by some time series key, e.g.
weather_station
in the example below).This would immensely speed up aggregation queries like
SELECT MIN(temperature) FROM weather_readings WHERE weather_station = 'MyLocalWeatherstation' AND time >= ... AND time < ...
over large ranges of time.I would reckon that these statistics can be calculated with minimal overhead when compressing chunks.
However, I cannot estimate how complicated it would be to make the query planner aware of them and handle all the edge cases. So in the end, it's just an idea.
For reference, IoTDB is doing something similar (they are calling it "Summary Info" [1]). EDIT: For further reference: BTrDB [2] and Timon [3] directly store (BTrDB) or reference (Timon) materialized aggregations at their tree indexes.
Best regards, Max
[1] https://www.vldb.org/pvldb/vol13/p2901-wang.pdf [2] https://www.usenix.org/system/files/conference/fast16/fast16-papers-andersen.pdf [3] https://dl.acm.org/doi/pdf/10.1145/3318464.3386136