timescale / timescaledb

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
https://www.timescale.com/
Other
16.83k stars 852 forks source link

[Bug]: Continuous aggregates with time_bucket hourly and timezone fail to update #6902

Open nofalx opened 1 month ago

nofalx commented 1 month ago

What type of bug is this?

Incorrect result, Unexpected error

What subsystems and features are affected?

Continuous aggregate

What happened?

I have a number of Continuous aggregates that uses hourly buckets

 time_bucket(interval '1 hour', hypertable_name.time, 'Asia/Dubai')  AS bucket

I noticed that i stopped seeing the new results after recreating the hypertable. After hours of debugging i pinned down that the hourly buckets dont refresh when used with timezone

image

TimescaleDB version affected

2.14.2 and 2.15.0

PostgreSQL version used

15.6

What operating system did you use?

Arch Linux and Ubuntu Linux

What installation method did you use?

Deb/Apt

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

May 09 11:40:20 ahmad-82nd postgres[245522]: 2024-05-09 11:40:20.361 +04 [245522] DEBUG:  launching job 1000 "Refresh Continuous Aggregate Policy [1000]"
May 09 11:40:20 ahmad-82nd postgres[246102]: 2024-05-09 11:40:20.368 +04 [246102] DEBUG:  extension state changed: unknown to created
May 09 11:40:20 ahmad-82nd postgres[246102]: 2024-05-09 11:40:20.368 +04 [246102] DEBUG:  Executing policy_refresh_continuous_aggregate with parameters {"end_offset": null, "start_offset": "3 mons", "mat_hypertable_id": 3}
May 09 11:40:20 ahmad-82nd postgres[246102]: 2024-05-09 11:40:20.369 +04 [246102] DEBUG:  rehashing catalog cache id 35 for pg_namespace; 9 tups, 4 buckets
May 09 11:40:20 ahmad-82nd postgres[246102]: 2024-05-09 11:40:20.370 +04 [246102] LOG:  job 1000 threw an error
May 09 11:40:20 ahmad-82nd postgres[246102]: 2024-05-09 11:40:20.370 +04 [246102] ERROR:  timestamp out of range
May 09 11:40:20 ahmad-82nd postgres[180293]: 2024-05-09 11:40:20.375 +04 [180293] LOG:  background worker "Refresh Continuous Aggregate Policy [1000]" (PID 246102) exited with exit code 1

How can we reproduce the bug?

CREATE TABLE metrics (
    sensor_id INTEGER NOT NULL,
    value DOUBLE PRECISION NOT NULL,
    timestamp TIMESTAMPTZ NOT NULL
);
SELECT create_hypertable('metrics', 'timestamp');

-- insert data into the hyper table
INSERT INTO metrics
SELECT
    s.sensor_id,
    random()*50 + 10,
    timestamp
FROM
    generate_series(DATE (now() - interval '6 month'), (now() - interval '5 day'), INTERVAL '1 hours') AS timestamp
        CROSS JOIN (SELECT generate_series(1, 200) as sensor_id) as s;

-- hourly continuous aggregate with timezone
CREATE MATERIALIZED VIEW datalake_hourly
WITH (timescaledb.continuous, timescaledb.materialized_only = true) AS
            SELECT
                time_bucket(INTERVAL '1 hour', metrics.timestamp, 'Asia/Dubai') AS bucket,
                sensor_id,
                count(*) as count_items
            FROM metrics
            GROUP BY bucket, sensor_id;

SELECT add_continuous_aggregate_policy(
        'datalake_hourly',
        start_offset => INTERVAL '3 months',
        end_offset => Null,
        schedule_interval => INTERVAL '15 seconds'
);        

-- insert few more rows
INSERT INTO metrics
SELECT
    s.sensor_id,
    random()*50 + 10,
    timestamp
FROM
    generate_series((now() - interval '5 day'), now(), INTERVAL '1 hour') AS timestamp
        CROSS JOIN (SELECT generate_series(1, 200) as sensor_id) as s;

-- wait and check the continuous aggregate
SELECT 
        timescaledb_information.jobs.job_id,
        timescaledb_information.jobs.application_name,
        timescaledb_information.jobs.initial_start,
        timescaledb_information.jobs.next_start,
        timescaledb_information.jobs.config,
        timescaledb_information.jobs.schedule_interval,
        timescaledb_information.jobs.max_runtime ,
        timescaledb_information.jobs.max_retries  ,   
        timescaledb_information.jobs.retry_period ,  
        timescaledb_information.jobs.scheduled ,
        timescaledb_information.jobs.fixed_schedule , 
        timescaledb_information.jobs.hypertable_name , 
        timescaledb_information.continuous_aggregates.view_name ,
        last_run_started_at ,
       job_status,
       last_run_status,
       last_successful_finish ,
       total_runs,
       total_failures ,
       total_successes 
FROM timescaledb_information.jobs
left join timescaledb_information.job_stats on timescaledb_information.jobs.job_id  =  timescaledb_information.job_stats.job_id 
left join timescaledb_information.continuous_aggregates on timescaledb_information.continuous_aggregates.materialization_hypertable_name  =  timescaledb_information.jobs.hypertable_name
order by job_id;
akuzm commented 1 month ago

Thanks for the script, it reproduces for me. The error is related to the NULL value of the end_offset, so probably as a workaround you can set it to something far in the future but not null.

@fabriziomello should the end point calculation follow the same logic as cagg_get_time_min? Currently the null end point is returned as max time and not +infinity even for variable-bucket caggs, so the calculations in ts_compute_inscribed_bucketed_refresh_window_variable fail.

nofalx commented 1 month ago

Hi @akuzm thank you for the reply. I believe in the end_offset we can only specify an interval and so this seemed to work with me '-10 years' and refreshed the data correctly from today post creating the job.

However I believe we should add a warning or enhance the code related to handling cases with NULL as we had lots of issue with it from back in time