timescale / timescaledb

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
https://www.timescale.com/
Other
17.79k stars 884 forks source link

[Bug]: timescaledb_experimental.policies contains duplicate entries for continuous aggregate policies #5492

Closed srieding closed 1 year ago

srieding commented 1 year ago

What type of bug is this?

Incorrect result

What subsystems and features are affected?

Continuous aggregate

What happened?

A single continuous aggregate update policy can show up as multiple records in timescaledb_experimental.policies I expect it to show up as a single record.

On a db with two continuous aggregate update policies I have this result: image for query select * from timescaledb_experimental.policies;

And I expect this result image

TimescaleDB version affected

2.8.1

PostgreSQL version used

14.6

What operating system did you use?

windows 10 x64

What installation method did you use?

Docker

What platform did you run on?

Other

Relevant log output and stack trace

No response

How can we reproduce the bug?

-- On an empty database, I create two hypertables and two continuous aggregates.
-- I then add two policies to the continuous aggregates.

CREATE TABLE measurements (
  generated_at timestamptz not null,
  value        int4        not null
);

SELECT create_hypertable(
         'measurements',
         'generated_at',
         chunk_time_interval => INTERVAL '7 days',
         if_not_exists => TRUE,
         create_default_indexes => FALSE
         );

CREATE MATERIALIZED VIEW measurements_summary_month
    WITH (timescaledb.continuous)
AS SELECT
     time_bucket('1 month', generated_at) AS time,
     lttb(generated_at, value, 600) AS lttb
   FROM measurements
   GROUP BY time_bucket('1 month', generated_at)
WITH NO DATA;

SELECT add_continuous_aggregate_policy('measurements_summary_month',
                                       start_offset => INTERVAL '3 months',
                                       end_offset => INTERVAL '1 hour',
                                       schedule_interval => INTERVAL '24 hours');

CREATE TABLE measurements_2 (
  generated_at timestamptz not null,
  value        int4        not null
);

SELECT create_hypertable(
         'measurements_2',
         'generated_at',
         chunk_time_interval => INTERVAL '7 days',
         if_not_exists => TRUE,
         create_default_indexes => FALSE
         );

CREATE MATERIALIZED VIEW measurements_summary_month_2
    WITH (timescaledb.continuous)
AS SELECT
     time_bucket('1 month', generated_at) AS time,
     lttb(generated_at, value, 600) AS lttb
   FROM measurements_2
   GROUP BY time_bucket('1 month', generated_at)
WITH NO DATA;

SELECT add_continuous_aggregate_policy('measurements_summary_month_2',
                                       start_offset => INTERVAL '5 months',
                                       end_offset => INTERVAL '4 hour',
                                       schedule_interval => INTERVAL '48 hours');

-- I expect and have two continuous aggregates.
select * from _timescaledb_catalog.continuous_agg;
-- I expect to have two policies. But I see 4.
select * from timescaledb_experimental.policies;
-- This de-duplicates the result to what I expected to see.
select timescaledb_experimental.policies.* from timescaledb_experimental.policies join _timescaledb_catalog.continuous_agg on _timescaledb_catalog.continuous_agg.mat_hypertable_id = ((config ->> 'mat_hypertable_id')::int) where user_view_name = relation_name;
RafiaSabih commented 1 year ago

Thanks @srieding for reporting this case. However, in order to proceed further I need to reproduce it and for that I need the definition of the function you have used here -- lttb(generated_at, value, 600) AS lttb.

Also, at a first look at the output you have added, the schedule_interval and config of the 2 policies are different so they do not look just the duplicates here.

srieding commented 1 year ago

I need the definition of the function you have used here -- lttb(generated_at, value, 600) AS lttb.

lttb is documented here: https://docs.timescale.com/api/latest/hyperfunctions/downsampling/ but I doubt you need to use that particular one to get a reproduction. A simple sum/count or any other aggregation will probably do.

Also, at a first look at the output you have added, the schedule_interval and config of the 2 policies are different so they do not look just the duplicates here.

There are two policies in the script. They can have the same or a different interval. Making them the same or different does not matter much to get a reproduction of the problem. For the reproduction script I chose different values for interval, start_offset and end_offset so they can be distinguished in the screenshot of the bug report. But you would get the same result if you choose the interval, start_offset and end_offset to be the same. The important point is: image There is one policy for aggregate measurements_summary_month with an interval of 24 hours and start_offset 3 moths so row 1 makes sense. There is no policy for aggregate measurements_summary_month with an interval of 48 hours and start_offset 5 moths so row 2 makes no sense. There is no policy for aggregate measurements_summary_month_2 with an interval of 24 hours and start_offset 3 months so row 3 makes no sense. There is one policy for aggregate measurements_summary_month_2 with an interval of 48 hours and start_offset 5 moths so row 4 makes sense.