Open Streamlinesx opened 1 year ago
Hello @Streamlinesx,
Thanks for the detailed reproduction steps. I was able to reproduce the issue with the current development version (9c7ae3e8a983ff1a19645c3d2dc0508ae8c69550) of TimescaleDB. The problem seems to be related to #5379.
For the first CAGG (data_aggregation_hour_min_max
), the materialization watermark is set to the end of the current time bucket. For the second CAGG (data_aggregation_hour_min_max_2
), the watermark is set to the min value. It appears that the more recent watermark for the first CAGG is disabling the real-time aggregation for this CAGG bucket.
test2=# select user_view_name,
_timescaledb_internal.to_timestamp(
_timescaledb_internal.cagg_watermark(mat_hypertable_id)
)
from _timescaledb_catalog.continuous_agg;
user_view_name | to_timestamp
---------------------------------+---------------------------------
data_aggregation_hour_min_max | 2023-06-13 14:00:00+02
data_aggregation_hour_min_max_2 | 4714-11-24 00:53:28+00:53:28 BC
(2 rows)
@jnidzwetzki
Thank you for looking into this.
I have investigated a bit further and it seems related to this troubleshooting step:
continuous-aggregate-watermark-is-in-the-future
When initializing the first cagg WITH NO DATA then it works as expected but does require the call of the refresh_continuous_aggregate()
function now
What type of bug is this?
Other
What subsystems and features are affected?
Continuous aggregate
What happened?
After creating a data table that contains time-series data from a sensor, I get two different behaviors depending on if I insert data into the table before or after the creation of an hourly continuous aggregate on this table.
If I insert data before creating the continuous aggregate, the first bucket will not have real-time aggregates on data that is inserted after the creation of the cagg. Only the second bucket will have real-time aggregates on new inserts.
If I create the continuous aggregate first, and only then start inserting data, the first bucket will have real-time aggregates, as expected and required.
In both cases, the bucket should not be materialized yet since the start_offset is set to one hour.
For my application, a workaround is to rename the base-table, create a new table with the original name of the old table, create the continuous aggregate on the new table, and only then insert data from the old table into the new one. However this is not ideal since the table I want to work with and first encountered this phenomenon on is rather large (60m+ entries)
TimescaleDB version affected
2.9.3
PostgreSQL version used
12
What operating system did you use?
Windows 10 x64
What installation method did you use?
Docker
What platform did you run on?
Other, Not applicable
Relevant log output and stack trace
No response
How can we reproduce the bug?