timescale / timescaledb

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
https://www.timescale.com/
Other
16.79k stars 852 forks source link

[Help needed]: Downsampling of all tables in a database? #6967

Open DrTron opened 1 month ago

DrTron commented 1 month ago

What type of bug is this?

Other

What subsystems and features are affected?

Continuous aggregate

What happened?

I've asked this on the Timescale forum before, but got zero answers, so maybe I can get this issue resolved here: I’m not a power user, simply using a database (currently influx1.8) for archival of smarthome data. So I’m a bit surprised that after reading about every article and forum post on timescaledb and downsampling, I haven’t found what I’m looking for.

I want to gradually downsample data over time, as probably a lot of people do: real-time data for a day, 10-minute-averages for a week, 30-minute-averages for 30 days, and so on. I currently have four continuous queries that do this.

I learned that for this I need:

materialized views for every timeframe (let’s call them 1d_view, 7d_view etc) continuous aggregates on top of continuous aggregates, aggregating data off the previous one retention policies for the raw data and the materialized views (1d_view, 7d_view) and I can then query those materialized views for the data I want. Which is pretty much the same way I currently do it with influxdb (continuous queries and retention policies). Now I found that my smarthome system (openhab) can insert data into a postgresql/timescale database (and telegraf can, too). The way that works is that every measurement (temperature, humidity, etc.) is stored as it’s own table, with time/value pairs as the rows (currently the same with influx).

Every example I have found (here, for instance for continuous aggregates from a single table (FROM tablename). My data is currently stretched over 200+ tables, so writing four continuous aggregated for downsampling per table is just not possible.

How can I create an aggregate that runs on every table in the database and on every value in that table? Can I use a wildcard in the query such as “SELECT FROM ”? Especially since openhab creates a table with time/value whenever a new measurement is implemented, it would have to be some aggregate that would include new tables. I.e., the whole database.

Influxdb does it this way:

CREATE CONTINUOUS QUERY cq_10m_for_7d ON openhab 
BEGIN SELECT mean(*) INTO openhab."10m_for_7d".:MEASUREMENT FROM openhab."1m_for_24h"./.*/ GROUP BY time(10m), * END

Like I said, I’m surprised I couldn’t find an example for this scenario, which I assume is rather common.

Thanks!

TimescaleDB version affected

2.15

PostgreSQL version used

16.0

What operating system did you use?

Debian bookworm

What installation method did you use?

Deb/Apt

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

No response

How can we reproduce the bug?

See above.
antekresic commented 3 weeks ago

Hi there.

I'm afraid we don't have any kind of automation that would be similar to what you are asking for.

You can create continuous aggregates for all your hypertables using a function but that's pretty involved as well and not a simple one liner which you are suggesting to.

Sorry I couldn't be of much more help.