timescale / timescaledb

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
https://www.timescale.com/
Other
17.94k stars 883 forks source link

[Bug]: Real Time aggregation breaks on the month level #7326

Open PS1TD opened 1 month ago

PS1TD commented 1 month ago

What type of bug is this?

Incorrect result

What subsystems and features are affected?

Continuous aggregate

What happened?

I am trying to set up a layered real time aggregation structure similar to how it is described in the docs. Link I take readings from multiple sensors every minute. I need to aggregate them into time buckets of hour, day, week, month, and year. Minute hypertable -> real time hourly real time hourly -> real time daily real time daily -> real time weekly real time daily -> real time monthly real time monthly -> real time yearly

On the initial creation all aggregations seem to do what they are supposed to. I get all views properly populated with aggregated data. However as soon as data changes only the hour, day and week aggregations get updated. Month and Year aggregates do not behave like real time aggregates in this case. Trying to manually refresh them does not help and creating an update policy does not help either.

TimescaleDB version affected

2.16.1

PostgreSQL version used

16

What operating system did you use?

Docker Container on Linux (Ubuntu)

What installation method did you use?

Docker

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

No response

How can we reproduce the bug?

I have provided all the steps need to reproduce such a situation on a fresh timescaledb-ha:pg16 container
Step 1:
Create Table with sensor readings

CREATE TABLE
    "SensorReadings" (
        "id" VARCHAR NOT NULL,
        "time" timestamptz NOT NULL,
        "temp" NUMERIC NOT NULL
    );

CREATE UNIQUE INDEX "SensorReadings_id_time_key" ON "SensorReadings" ("id", "time");

Step 2: Fill it with data

INSERT INTO
    public."SensorReadings" ("id", "time", "temp")
VALUES
    (
        GENERATE_SERIES(1, 10),
        (NOW()::date - INTERVAL '3 minutes'),
        TRUNC(RANDOM() * 100)
    );

INSERT INTO
    public."SensorReadings" ("id", "time", "temp")
VALUES
    (
        GENERATE_SERIES(1, 10),
        (NOW()::date - INTERVAL '2 minutes'),
        TRUNC(RANDOM() * 100)
    );

Step 3: Convert to hypertable

SELECT
    public.create_hypertable (
        '"SensorReadings"',
        public.by_range ('time', INTERVAL '7 days'),
        migrate_data => TRUE
    );

Step 4: Create hourly real time aggregate

CREATE MATERIALIZED VIEW
    "SensorReadingsHourly"
WITH
    (
        timescaledb.continuous,
        timescaledb.materialized_only = FALSE
    ) AS
SELECT
    public.time_bucket ('1 hour', "time") AS "time1h",
    "id",
    AVG("temp") AS "tempAvg",
    public.stats_agg ("temp") AS "tempAgg"
FROM
    "SensorReadings"
GROUP BY
    "time1h",
    "id"
ORDER BY
    "time1h" DESC;

Step 5: Create daily real time aggregate

CREATE MATERIALIZED VIEW
    "SensorReadingsDaily"
WITH
    (
        timescaledb.continuous,
        timescaledb.materialized_only = FALSE
    ) AS
SELECT
    public.time_bucket ('1 day', "time1h", 'Europe/Moscow') AS "time1d",
    "id",
    public.average (public.rollup ("tempAgg")) AS "tempAvg",
    public.rollup ("tempAgg") AS "tempAgg"
FROM
    "SensorReadingsHourly"
GROUP BY
    "time1d",
    "id"
ORDER BY
    "time1d" DESC;

Step 6: Create weekly real time aggregate

CREATE MATERIALIZED VIEW
    "SensorReadingsWeekly"
WITH
    (
        timescaledb.continuous,
        timescaledb.materialized_only = FALSE
    ) AS
SELECT
    public.time_bucket ('1 week', "time1d", 'Europe/Moscow') AS "time1w",
    "id",
    public.average (public.rollup ("tempAgg")) AS "tempAvg",
    public.rollup ("tempAgg") AS "tempAgg"
FROM
    "SensorReadingsDaily"
GROUP BY
    "time1w",
    "id"
ORDER BY
    "time1w" DESC;

Step 7: Create monthly real time aggregate

CREATE MATERIALIZED VIEW
    "SensorReadingsMonthly"
WITH
    (
        timescaledb.continuous,
        timescaledb.materialized_only = FALSE
    ) AS
SELECT
    public.time_bucket ('1 month', "time1d", 'Europe/Moscow') AS "time1m",
    "id",
    public.average (public.rollup ("tempAgg")) AS "tempAvg",
    public.rollup ("tempAgg") AS "tempAgg"
FROM
    "SensorReadingsDaily"
GROUP BY
    "time1m",
    "id"
ORDER BY
    "time1m" DESC;

Step 8: Create yearly real time aggregate

CREATE MATERIALIZED VIEW
    "SensorReadingsYearly"
WITH
    (
        timescaledb.continuous,
        timescaledb.materialized_only = FALSE
    ) AS
SELECT
    public.time_bucket ('1 year', "time1m", 'Europe/Moscow') AS "time1y",
    "id",
    public.average (public.rollup ("tempAgg")) AS "tempAvg",
    public.rollup ("tempAgg") AS "tempAgg"
FROM
    "SensorReadingsMonthly"
GROUP BY
    "time1y",
    "id"
ORDER BY
    "time1y" DESC;

Step 9: Add new data (Notice in my case 20 new sensors were added and the previous 10 recieved new data)

INSERT INTO
    public."SensorReadings" (id, TIME, TEMP)
VALUES
    (
        GENERATE_SERIES(1, 30),
        (NOW()::date),
        TRUNC(RANDOM() * 100)
    );

After all these steps SensorReadingsHourly, Daily & Weekly will have new sensor, Monthly and Yearly won't

Things I have tried to refresh the not working aggregates

Manually refresh aggregate. Returns that aggregate is already up to date.

CALL public.refresh_continuous_aggregate (
    '"SensorReadingsMonthly"',
    NULL,
    LOCALTIMESTAMP - INTERVAL '1 week'
);

Create short schedule refresh policies. They run but nothing changes.

SELECT
    add_continuous_aggregate_policy (
        '"SensorReadingsHourly"',
        start_offset => INTERVAL '1 week',
        end_offset => INTERVAL '1 hour',
        schedule_interval => INTERVAL '1 second'
    );

SELECT
    add_continuous_aggregate_policy (
        '"SensorReadingsMonthly"',
        start_offset => INTERVAL '3 months',
        end_offset => INTERVAL '1 month',
        schedule_interval => INTERVAL '1 minute'
    );

Re-enabling real time aggregation in case it did not apply first time

ALTER MATERIALIZED VIEW "SensorReadingsMonthly"
SET
    (timescaledb.materialized_only = FALSE);

Help Wanted

It might totally be a user error and I maybe have misunderstood something in the documentation but the setup seems reasonable. Any help with this case will be much appreciated <3

PS1TD commented 1 month ago

Just tried reproducing again but this time doing (NOW()::date - INTERVAL '1 minute') on time in Step 9 to keep all readings in 1 day and got even weirder results where none of the aggregates updated. Creating readings 1 month ahead however forces the Monthly aggregate to update.

PS1TD commented 1 month ago

Also would like to point out the the underlying SELECT queries do in fact return correct and up-to-date data, its just the view that is not updated

PS1TD commented 1 month ago

I think I have figured out what the problem was. It has to do with how the watermark behaves when you create continuous aggregations WITH DATA. Appending to WITH NO DATA has returned the behavior to expected. I feel like this should be mentioned somewhere more visible than the troubleshooting section. Linking similar issues https://github.com/timescale/timescaledb/issues/5775 https://github.com/timescale/timescaledb/issues/5379

PS1TD commented 1 month ago

Ill keep the issue open so that it is noticed, feel free to close it