Open michael-sayapin opened 2 years ago
Without timezone
on time_bucket
it works instantly.
On further digging, it seems like 2.8.0 time_bucket
(with timezone
) behaves exactly like pre-2.8 time_bucket_ng
- doing a full hypertable scan from the continuous aggregate refresh code. If I set materialized_only = false
, I can see EXPLAIN plan that is scanning all chunks (with WHERE created >= '2022-09-01'
which should only touch a single chunk), and also decompressing CustomScan on compressed chunks. Needless to say, it is not usable at all.
Possibly related? https://github.com/timescale/timescaledb/issues/4547
Possibly related? #4547
Yes, I think it's the same root cause. I had high hopes for 2.8.0 since I was under an impression that only time_bucket_ng was affected. However, yeah, upgraded time_bucket with timezone support seems to have the same problem.
Hello! Time scaled version affected: 2.8.0 PostgreSQL version used: 12.8
Faced a similar problem, but without timezone. My actions:
What could be the problem?
Hello! When is it planned to fix this bug approximately? I use bucket = '1 month'. For about of 6 000 000 records refresh_continuous_aggregate() works 13 minutes, and only SQL query works 12 seconds. I do not use timezone for time_bucket()
Hi!
I'm not sure if our issue is exactly the same as yours, but there are some similarities...
EDIT: We're using timescale 2.8.1 on PG 12.
We have ~100M+ rows in our hypertable, spanning over 8 years (majority of data is from last 3 years), with default 7 day chunk size.
We created a continuous aggregate that precalculates some aggregates, creating daily time buckets for each device. Filling this from scratch took about 15 minutes (and used lots of cpu, memory and disk IO). We made sure that wal size was big enough so that the timed checkpoint could do the actual writes to disk.
If I change some data (say 100k rows) from a single chunk (basically 1 day) and then refresh the continuous aggregate, the refresh process is super fast, takes like less than a second.
But if I change approximately the same total amount of data from all chunks (changing a few rows from each chunk totalling 100k-200k rows), the refresh process becomes as slow as creating it from scratch. So it takes like 15 minutes, sometimes even more.
I also did a test where the updates cover only certain key fields (which are indexed in the hypertable and are used as group by columns in the continuous aggregate). The result was the same; as the changes affected all chunks in the hypertable, the refresh was slow.
I don't know if it's a bug or a feature, but to me it seems that the invalidation engine only uses timestamps to determine what data needs to be recalculated, although it could also check the key columns. I'd imagine, that the logic could be verbally described as "continuous aggregate CG_1 uses column X from hypertable HT_1. Now there was a update to HT_1 for X=123 and timestamp Y, so CG_1 refresh should recalculate rows where timestamp = Y and X = 123".
Hi again.
I'm continuing with different test patterns.
Now I updated ~30k rows from one year. It took about 20 seconds, and after it the refresh_continuous_aggregate (covering all history) took only 1 minute.
Then I updated the same 30k rows (now taking about 5 secs), and another 10 month range (not consecutive) covering 54k rows, updating for 4 minutes. refresh_continuous_aggregate now took 6 minutes.
Hi,
just a little update on what we've figured out.
We browsed through the code more thoroughly and found some good comments there. For example refresh.c mentioned how it merges the buckets based on timescaledb.materializations_per_refresh_window which was then mentioned in this issue. This helped us to understand the internal logic better.
We also noticed that our actual VM (some linux box in cloud) is much more reasonable on resource usage than our Windows laptops running Docker and WSL. On Windows, the refresh seems to suck all the memory it can get. On actual Linux, the refresh seems to do just fine with only a couple of gigs on RAM and still finish pretty much on same duration.
What type of bug is this?
Performance issue
What subsystems and features are affected?
Continuous aggregate
What happened?
If I try to
call refresh_continuous_aggregate('controller_log_over_1d', '2022-08-30', '2022-09-01')
, it hogs all the memory, CPU is at 100%, and hanging for 30+ minutes, after which I kill it.If I just run the SQL query of the aggregate like this:
It takes a very reasonable amount of time.
Tried with
timescaledb.finalized = false
as well, same results.TimescaleDB version affected
2.8.0
PostgreSQL version used
13.7
What operating system did you use?
Ubuntu 20.04
What installation method did you use?
Docker
What platform did you run on?
On prem/Self-hosted
Relevant log output and stack trace
No response
How can we reproduce the bug?