Open nieyuan1980 opened 5 months ago
Made a simplified repro for this:
drop materialized view if exists metrics_weekly;
drop materialized view if exists metrics_daily;
drop materialized view if exists metrics_hourly;
drop table if exists metrics;
create table metrics(time timestamptz, value float);
select create_hypertable('metrics', 'time');
create materialized view metrics_hourly with (timescaledb.continuous, timescaledb.materialized_only = false) as select time_bucket('1 hour', time) time, max(value) value from metrics group by 1;
create materialized view metrics_daily with (timescaledb.continuous, timescaledb.materialized_only = false) as select time_bucket('1 day', time) time, max(value) value from metrics_hourly group by 1;
create materialized view metrics_weekly with (timescaledb.continuous, timescaledb.materialized_only = false) as select time_bucket('1 week', time) time, max(value) value from metrics_daily group by 1;
insert into metrics values ('2021-01-01', 1);
call refresh_continuous_aggregate('metrics_hourly', null, null);
call refresh_continuous_aggregate('metrics_daily', null, null);
call refresh_continuous_aggregate('metrics_weekly', null, null);
select max(value) from metrics_hourly;
select max(value) from metrics_daily;
select max(value) from metrics_weekly;
insert into metrics values ('2021-01-02', 2);
call refresh_continuous_aggregate('metrics_hourly', null, null);
call refresh_continuous_aggregate('metrics_daily', null, null);
select max(value) from metrics_hourly;
select max(value) from metrics_daily;
select max(value) from metrics_weekly;
insert into metrics values ('2021-01-03', 3);
call refresh_continuous_aggregate('metrics_hourly', null, null);
select max(value) from metrics_hourly;
select max(value) from metrics_daily;
select max(value) from metrics_weekly;
insert into metrics values ('2021-01-04', 4);
select max(value) from metrics_hourly;
select max(value) from metrics_daily;
select max(value) from metrics_weekly;
The third-level real-time aggregate (weekly) gives the wrong result when it's not updated explicitly.
@nieyuan1980 @akuzm actually this is not a bug and instead a limitation of the current Hierarchical Continuous Aggregate architecture. You need to always refresh all the caggs in the chain in order to properly move the watermark to get the correct results.
What type of bug is this?
Incorrect result
What subsystems and features are affected?
Continuous aggregate, Query executor
What happened?
There is a transformer and every 15 minutes the collector automatically records its readings. Its readings are downsampled to the maximum hourly, daily, monthly, and yearly, respectively. But it has an incredible number of data on continuous aggregation.
Step 1, the original table
We see that the current maximum readings are: 6339985.166
STEP 2, CREATE HOURLY DOWNSAMPLING (HERE I USED THE TIMESCALEDB.MATERIALIZED_ONLY = FALSE PARAMETER)
We see that its data is correct:6339985.166
Step 3, create a built-in daily downsampling (create a continuous aggregate on top of another continuous aggregate)
Its data is also correct:6339985.166
Step 3, Create a Built-in Monthly Downsampling (Create a Continuous Aggregate on Top of Another Continuous Aggregate)
The data here confuses me:6329992.613。 I think it should be:6339985.166
Excuse me, what is the reason for this? How should it be resolved?
TimescaleDB version affected
2.15.0
PostgreSQL version used
PostgreSQL 16.3
What operating system did you use?
(Ubuntu 16.3-1.pgdg22.04+1
What installation method did you use?
Deb/Apt
What platform did you run on?
Other
Relevant log output and stack trace
No response
How can we reproduce the bug?