timescale / timescaledb

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
https://www.timescale.com/
Other
17.46k stars 875 forks source link

[Bug]: Performance issue, refresh_continuous_aggregate vs original cagg query #4662

Open michael-sayapin opened 2 years ago

michael-sayapin commented 2 years ago

What type of bug is this?

Performance issue

What subsystems and features are affected?

Continuous aggregate

What happened?

CREATE MATERIALIZED VIEW controller_log_over_1d (
    created,
    control_group_controller_id,
    point_id,
    last_user_state_id,
    last_real_state_id,
    user_state_id,
    real_state_id,
    first_user_value,
    first_real_value,
    last_user_value,
    last_real_value,
    user_value,
    real_value,
    sum_user_value,
    sum_real_value,
    min_user_value,
    min_real_value,
    max_user_value,
    max_real_value
)
WITH (timescaledb.continuous, timescaledb.materialized_only = true) AS
SELECT
    time_bucket('24h', created, timezone => 'Asia/Shanghai') AS created,
    control_group_controller_id,
    point_id,
    last(user_state_id, created) AS last_user_state_id,
    last(real_state_id, created) AS last_real_state_id,
    last(user_state_id, created) AS user_state_id,
    last(real_state_id, created) AS real_state_id,
    first(user_value, created) AS first_user_value,
    first(real_value, created) AS first_real_value,
    last(user_value, created) AS last_user_value,
    last(real_value, created) AS last_real_value,
    AVG(user_value) AS user_value,
    AVG(real_value) AS real_value,
    SUM(user_value) AS sum_user_value,
    SUM(real_value) AS sum_real_value,
    MIN(user_value) AS min_user_value,
    MIN(real_value) AS min_real_value,
    MAX(user_value) AS max_user_value,
    MAX(real_value) AS max_real_value
FROM
    controller_log
GROUP BY 1, 2, 3
WITH NO DATA

If I try to call refresh_continuous_aggregate('controller_log_over_1d', '2022-08-30', '2022-09-01'), it hogs all the memory, CPU is at 100%, and hanging for 30+ minutes, after which I kill it.

If I just run the SQL query of the aggregate like this:

# explain analyze SELECT
    time_bucket('1d', created, timezone => 'Asia/Shanghai') AS created,
    control_group_controller_id,
    point_id,
    last(user_state_id, created) AS last_user_state_id,
    last(real_state_id, created) AS last_real_state_id,
    null AS user_state_id,
    null AS real_state_id,
    first(user_value, created) AS first_user_value,
    first(real_value, created) AS first_real_value,
    last(user_value, created) AS last_user_value,
    last(real_value, created) AS last_real_value,
    AVG(user_value) AS user_value,
    AVG(real_value) AS real_value,
    SUM(user_value) AS sum_user_value,
    SUM(real_value) AS sum_real_value,
    MIN(user_value) AS min_user_value,
    MIN(real_value) AS min_real_value,
    MAX(user_value) AS max_user_value,
    MAX(real_value) AS max_real_value
FROM
    controller_log
WHERE created BETWEEN '2022-08-30' AND '2022-09-01'
GROUP BY 1, 2, 3;
                                                                                                                     QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=319375.00..572716.20 rows=452395 width=184) (actual time=3131.157..3253.559 rows=4323 loops=1)
   Group Key: (time_bucket('1 day'::interval, _hyper_54_2393_chunk.created, 'Asia/Shanghai'::text, NULL::timestamp with time zone, NULL::interval)), _hyper_54_2393_chunk.control_group_controller_id, _hyper_54_2393_chunk.point_id
   ->  Gather Merge  (cost=319375.00..479975.23 rows=1357185 width=320) (actual time=3130.933..3158.811 rows=14252 loops=1)
         Workers Planned: 3
         Workers Launched: 3
         ->  Sort  (cost=318374.96..319505.95 rows=452395 width=320) (actual time=3031.670..3032.792 rows=3563 loops=4)
               Sort Key: (time_bucket('1 day'::interval, _hyper_54_2393_chunk.created, 'Asia/Shanghai'::text, NULL::timestamp with time zone, NULL::interval)), _hyper_54_2393_chunk.control_group_controller_id, _hyper_54_2393_chunk.point_id
               Sort Method: quicksort  Memory: 3708kB
               Worker 0:  Sort Method: quicksort  Memory: 2489kB
               Worker 1:  Sort Method: quicksort  Memory: 2471kB
               Worker 2:  Sort Method: quicksort  Memory: 3745kB
               ->  Partial HashAggregate  (cost=198839.53..236932.84 rows=452395 width=320) (actual time=2646.627..3025.562 rows=3563 loops=4)
                     Group Key: time_bucket('1 day'::interval, _hyper_54_2393_chunk.created, 'Asia/Shanghai'::text, NULL::timestamp with time zone, NULL::interval), _hyper_54_2393_chunk.control_group_controller_id, _hyper_54_2393_chunk.point_id
                     Planned Partitions: 512  Batches: 512  Memory Usage: 6209kB  Disk Usage: 45128kB
                     Worker 0:  Batches: 493  Memory Usage: 6209kB  Disk Usage: 45192kB
                     Worker 1:  Batches: 489  Memory Usage: 6209kB  Disk Usage: 42248kB
                     Worker 2:  Batches: 511  Memory Usage: 6209kB  Disk Usage: 48520kB
                     ->  Result  (cost=0.00..79413.15 rows=1459339 width=48) (actual time=0.459..1558.214 rows=1130851 loops=4)
                           ->  Parallel Append  (cost=0.00..61171.42 rows=1459339 width=40) (actual time=0.345..541.095 rows=1130851 loops=4)
                                 ->  Parallel Index Scan using _hyper_54_2393_chunk_unique_cgc_point_per_ms on _hyper_54_2393_chunk  (cost=0.42..2.64 rows=1 width=40) (actual time=1.265..1.265 rows=0 loops=1)
                                       Index Cond: ((created >= '2022-08-30 00:00:00+00'::timestamp with time zone) AND (created <= '2022-09-01 00:00:00+00'::timestamp with time zone))
                                 ->  Parallel Seq Scan on _hyper_54_2386_chunk  (cost=0.00..27082.75 rows=733316 width=40) (actual time=0.029..226.865 rows=568182 loops=4)
                                       Filter: ((created >= '2022-08-30 00:00:00+00'::timestamp with time zone) AND (created <= '2022-09-01 00:00:00+00'::timestamp with time zone))
                                 ->  Parallel Seq Scan on _hyper_54_2384_chunk  (cost=0.00..26789.33 rows=726022 width=40) (actual time=0.022..399.225 rows=1125338 loops=2)
                                       Filter: ((created >= '2022-08-30 00:00:00+00'::timestamp with time zone) AND (created <= '2022-09-01 00:00:00+00'::timestamp with time zone))
 Planning Time: 13.203 ms
 Execution Time: 3261.423 ms
(27 rows)

It takes a very reasonable amount of time.

Tried with timescaledb.finalized = false as well, same results.

TimescaleDB version affected

2.8.0

PostgreSQL version used

13.7

What operating system did you use?

Ubuntu 20.04

What installation method did you use?

Docker

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

No response

How can we reproduce the bug?

See the description.
michael-sayapin commented 2 years ago

Without timezone on time_bucket it works instantly.

michael-sayapin commented 2 years ago

On further digging, it seems like 2.8.0 time_bucket (with timezone) behaves exactly like pre-2.8 time_bucket_ng - doing a full hypertable scan from the continuous aggregate refresh code. If I set materialized_only = false, I can see EXPLAIN plan that is scanning all chunks (with WHERE created >= '2022-09-01' which should only touch a single chunk), and also decompressing CustomScan on compressed chunks. Needless to say, it is not usable at all.

mfundul commented 2 years ago

Possibly related? https://github.com/timescale/timescaledb/issues/4547

michael-sayapin commented 2 years ago

Possibly related? #4547

Yes, I think it's the same root cause. I had high hopes for 2.8.0 since I was under an impression that only time_bucket_ng was affected. However, yeah, upgraded time_bucket with timezone support seems to have the same problem.

LuxCore commented 1 year ago

Hello! Time scaled version affected: 2.8.0 PostgreSQL version used: 12.8

Faced a similar problem, but without timezone. My actions:

  1. Data for 6 years have been recalculated 6 times for each year in its own separate window.
  2. Next, the data for yesterday is loaded into the hypertable and the following window for all period was taken: window_start => NULL, window_and => 'TODAY'::TIMESTAMP. An error was displayed on such a window: SQL Error [22008]: ERROR: timestamp out of range.
  3. I replaced NULL with '2017-04-01'. The data had to be recalculated relatively quickly because there is not much data to recalculate, but in fact they were counted for 76 minutes.
  4. Then I typed '2017-01-01' for window_start and recalculation took 19 min.
  5. Now I decided to check period from 2018 and it was expected behaviour: continuous aggregate is already up-to-date.
  6. Then I wanted to recheck again period from 2017 (2017-01-01) and behaviour was unexpected: recalculation took 18 min for calculated data without new data.
  7. I found that any window_start from '2017-02-01' gaves the "continuous aggregate is already up-to-date". But window_start => '2017-01-01' perform recalculation every time.

What could be the problem?

LuxCore commented 1 year ago

Hello! When is it planned to fix this bug approximately? I use bucket = '1 month'. For about of 6 000 000 records refresh_continuous_aggregate() works 13 minutes, and only SQL query works 12 seconds. I do not use timezone for time_bucket()

VMM-Mtech commented 1 year ago

Hi!

I'm not sure if our issue is exactly the same as yours, but there are some similarities...

EDIT: We're using timescale 2.8.1 on PG 12.

We have ~100M+ rows in our hypertable, spanning over 8 years (majority of data is from last 3 years), with default 7 day chunk size.

We created a continuous aggregate that precalculates some aggregates, creating daily time buckets for each device. Filling this from scratch took about 15 minutes (and used lots of cpu, memory and disk IO). We made sure that wal size was big enough so that the timed checkpoint could do the actual writes to disk.

If I change some data (say 100k rows) from a single chunk (basically 1 day) and then refresh the continuous aggregate, the refresh process is super fast, takes like less than a second.

But if I change approximately the same total amount of data from all chunks (changing a few rows from each chunk totalling 100k-200k rows), the refresh process becomes as slow as creating it from scratch. So it takes like 15 minutes, sometimes even more.

I also did a test where the updates cover only certain key fields (which are indexed in the hypertable and are used as group by columns in the continuous aggregate). The result was the same; as the changes affected all chunks in the hypertable, the refresh was slow.

I don't know if it's a bug or a feature, but to me it seems that the invalidation engine only uses timestamps to determine what data needs to be recalculated, although it could also check the key columns. I'd imagine, that the logic could be verbally described as "continuous aggregate CG_1 uses column X from hypertable HT_1. Now there was a update to HT_1 for X=123 and timestamp Y, so CG_1 refresh should recalculate rows where timestamp = Y and X = 123".

VMM-Mtech commented 1 year ago

Hi again.

I'm continuing with different test patterns.

Now I updated ~30k rows from one year. It took about 20 seconds, and after it the refresh_continuous_aggregate (covering all history) took only 1 minute.

Then I updated the same 30k rows (now taking about 5 secs), and another 10 month range (not consecutive) covering 54k rows, updating for 4 minutes. refresh_continuous_aggregate now took 6 minutes.

VMM-Mtech commented 1 year ago

Hi,

just a little update on what we've figured out.

We browsed through the code more thoroughly and found some good comments there. For example refresh.c mentioned how it merges the buckets based on timescaledb.materializations_per_refresh_window which was then mentioned in this issue. This helped us to understand the internal logic better.

We also noticed that our actual VM (some linux box in cloud) is much more reasonable on resource usage than our Windows laptops running Docker and WSL. On Windows, the refresh seems to suck all the memory it can get. On actual Linux, the refresh seems to do just fine with only a couple of gigs on RAM and still finish pretty much on same duration.