[Bug]: Data Outside Refresh Window Crashes Refresh

tonygunter commented 3 weeks ago

What type of bug is this?

Unexpected error

What subsystems and features are affected?

Continuous aggregate

What happened?

Inserting data outside the refresh window results in a refresh that runs for several hours.

TimescaleDB version affected

2.8.1

PostgreSQL version used

14.6

What operating system did you use?

(Ubuntu 14.6-1.pgdg20.04+1) on x86_64-pc-linux-gnu

What installation method did you use?

Docker

What platform did you run on?

Amazon Web Services (AWS)

Relevant log output and stack trace

reading=> SELECT datname, pid, state, query, age(clock_timestamp(), query_start) AS age 
FROM pg_stat_activity
WHERE state <> 'idle' 
    AND query NOT LIKE '% FROM pg_stat_activity %' 
ORDER BY age; select * from rz_refresh_status;select j.job_id,j.proc_name,s.next_start  from timescaledb_information.jobs j INNER JOIN timescaledb_information.job_stats s on j.job_id=s.job_id and j.proc_name like '%refresh%'; select current_timestamp; select bucket from rz_minute_downsample order by bucket desc limit 1; select bucket from rz_hour_downsample order by bucket desc limit 1;\
 datname |  pid   | state  |                              query                               |       age       
---------+--------+--------+------------------------------------------------------------------+-----------------
 reading | 470672 | active | CALL _timescaledb_internal.policy_refresh_continuous_aggregate() | 00:42:02.644494
 reading | 462016 | active | CALL _timescaledb_internal.policy_refresh_continuous_aggregate() | 01:09:10.763181

How can we reproduce the bug?

I have a nearly empty database with one hypertable and two continuous aggregates.  

   Column   |            Type             | Collation | Nullable | Default | Storage  | Compression | Stats target | Description 
------------+-----------------------------+-----------+----------+---------+----------+-------------+--------------+-------------
 client_id  | character varying(255)      |           | not null |         | extended |             |              | 
 name       | character varying(255)      |           | not null |         | extended |             |              | 
 profile_id | character varying(255)      |           | not null |         | extended |             |              | 
 time       | timestamp without time zone |           | not null |         | plain    |             |              | 
 exc        | integer                     |           |          |         | plain    |             |              | 
 unit       | character varying(255)      |           |          |         | extended |             |              | 
 val        | double precision            |           |          |         | plain    |             |              | 
 valid      | boolean                     |           |          |         | plain    |             |              | 
 sensor_id  | character varying(255)      |           |          |         | extended |             |              | 
 ordinal    | integer                     |           |          |         | plain    |             |              | 
 ord        | integer                     |           |          |         | plain    |             |              | 
Indexes:
    "reading_parameter_pkey" PRIMARY KEY, btree (client_id, profile_id, name, "time")
    "reading_parameter_temp_time_idx" btree ("time" DESC)
Triggers:
    ts_cagg_invalidation_trigger AFTER INSERT OR DELETE OR UPDATE ON reading_parameter FOR EACH ROW EXECUTE FUNCTION _timescaledb_internal.continuous_agg_invalidation_trigger('1')
    ts_insert_blocker BEFORE INSERT ON reading_parameter FOR EACH ROW EXECUTE FUNCTION _timescaledb_internal.insert_blocker()
Child tables: _timescaledb_internal._hyper_1_236_chunk,
              _timescaledb_internal._hyper_1_237_chunk,
              _timescaledb_internal._hyper_1_238_chunk,
              _timescaledb_internal._hyper_1_239_chunk,
              _timescaledb_internal._hyper_1_240_chunk,
              _timescaledb_internal._hyper_1_241_chunk,
              _timescaledb_internal._hyper_1_242_chunk,
              _timescaledb_internal._hyper_1_243_chunk,
              _timescaledb_internal._hyper_1_244_chunk,
              _timescaledb_internal._hyper_1_245_chunk,
              _timescaledb_internal._hyper_1_246_chunk,
              _timescaledb_internal._hyper_1_247_chunk,
              _timescaledb_internal._hyper_1_248_chunk,
              _timescaledb_internal._hyper_1_249_chunk,
              _timescaledb_internal._hyper_1_250_chunk,
              _timescaledb_internal._hyper_1_251_chunk,
              _timescaledb_internal._hyper_1_252_chunk,
              _timescaledb_internal._hyper_1_253_chunk
Access method: heap

            CREATE MATERIALIZED VIEW rz_minute_downsample
            WITH (timescaledb.continuous, timescaledb.materialized_only=true, timescaledb.create_group_indexes=false) AS
            SELECT client_id as product_id,
                   profile_id,
                   name,
                   min(unit) as uom,
                   sensor_id as device_id,
                   time_bucket('1 minutes', time) as bucket,
                   min(val) as min_value,
                   first(time, val) as min_value_time,
                   max(val) as max_value,
                   last(time, val) as max_value_time,
                   avg(val) as average,
                   bit_or(exc) as exc,
                   count(val) as sample_count
            from reading_parameter
            where valid=true
            group by client_id, profile_id, sensor_id, name, bucket
            WITH NO DATA;

            CREATE MATERIALIZED VIEW rz_hour_downsample
            WITH (timescaledb.continuous, timescaledb.materialized_only=true, timescaledb.create_group_indexes=false) AS
            SELECT client_id as product_id,
                   profile_id, name,
                   min(unit) as uom,
                   sensor_id as device_id,
                   time_bucket('1 hour', time) as bucket,
                   min(val) as min_value,
                   first(time, val) as min_value_time,
                   max(val) as max_value,
                   last(time, val) as max_value_time,
                   avg(val) as average,
                   bit_or(exc) as exc,
                   count(val) as sample_count
            from reading_parameter
            where valid=true
            group by client_id, profile_id, sensor_id, name, bucket
            WITH NO DATA;

I set refresh polices for one month of data.

SELECT add_continuous_aggregate_policy('rz_minute_downsample',  start_offset => '1 month',  end_offset => '1 seconds',  schedule_interval => '10 seconds');

SELECT add_continuous_aggregate_policy('rz_hour_downsample',  start_offset => '1 month',  end_offset => '1 seconds',  schedule_interval => '10 seconds');

I insert 100000 record three weeks old and check the runtime for the refresh policy.  Refresh finishes almost immediately.

create or replace procedure test() language plpgsql
as $$
declare
begin
    FOR i IN 1..100000 LOOP
        INSERT INTO reading_parameter(client_id, profile_id, name, time, sensor_id, val, unit, valid, exc, ord) values
        ('TEST', 'TEST', 'TEST', current_timestamp-'3 weeks'::INTERVAL+i*'1 second'::interval, 'TEST', i, 'TEST', true, 0, 0);
        COMMIT;
        PERFORM pg_sleep(0.01);
    END LOOP;
end;
$$;

call test();

I then insert 100,000 records 7 months old. Data will not show up in continuous aggregate (desired behavior) but runtime is extremely lengthy. Next refresh will run for hours, despite the fact that the data is outside the refresh window and irrelevant to the refresh.

create or replace procedure test() language plpgsql
as $$
declare
begin
    FOR i IN 1..100000 LOOP
        INSERT INTO reading_parameter(client_id, profile_id, name, time, sensor_id, val, unit, valid, exc, ord) values
        ('TEST', 'TEST', 'TEST', current_timestamp-'7 months'::INTERVAL+i*'1 second'::interval, 'TEST', i, 'TEST', true, 0, 0);
        COMMIT;
        PERFORM pg_sleep(0.01);
    END LOOP;
end;
$$;

call test();

tonygunter commented 3 weeks ago

I'm creating aggregates with no index because we have a single query that hits the data by product_id, profile_id, name, and bucket. The indexes are created after I populate the data in the views to gain a little speed on the initial inserts.

CREATE INDEX rz_minute_index ON rz_minute_downsample(product_id, profile_id, name, bucket);
CREATE INDEX rz_hour_index ON rz_hour_downsample(product_id, profile_id, name, bucket);

tonygunter commented 3 weeks ago

I can't seem to reproduce this behavior on version 2.16.1

antekresic commented 3 weeks ago

Yeah, I was going to suggest trying it out on the later versions, 2.8.1 is pretty old at this point.

Definitely suggest upgrading as there have been major updates in the last 9 minor version.

timescale / timescaledb