Open Bykiev opened 6 days ago
@Bykiev
your query basically includes ALL the chunks and almost all the rows from all the chunks because of your WHERE clause. So since all the data is being pulled out indexscan is not very attractive to the planner.
where "Dt" <= '2024-05-23 04:24:04'
That is the reason [parallel] sequence scans get used in the query. If you tweak the query as below then you will see the index scan being used as desired by you.
EXPLAIN SELECT "ParameterId", max("Dt")
FROM "Data"
where "Dt" > '2024-05-23 04:24:04'
group by "ParameterId";
Thanks, this makes sense. But because of this the only solution I can see is to create multiple tables instead of partitioning and search by index in each table. Seems timescaleDB is not good for such cases where we need to get the latest value by each group on specific time point
I found some other strange things: I've created a new hypertable where min date is "2022-01-01 00:00:00.387116+03" and max date is "2023-12-31 23:59:58.739976+03". Why does it created a 3 chunks for hypertable by_range('Dt', INTERVAL '1 year')
(I think there should be 2 chunks)? When I executing the query with condition where "Dt" >= '2022-01-01 00:00:00' and "Dt" <= '2022-11-30 04:24:04'
explain show that it'll scan 2 chunks sequentially? Why does it happen with timescaleDB?
To get the max over a large arbitrary time range efficiently with timescale, wouldn't the best way be to create hierarchical continuous max
aggregates, and select a max
from among the aggregate buckets covering your time range?
For example if you had 1 second
, 1 minute
, 1 hour
, 1 day
aggregates, and you want to get the max
where "Dt" <= '2024-05-23 01:01:01'
, you could get the max
of:
1 day
aggregates <= 2024-05-23
1 hour
aggregate from 2024-05-03 00:00
to 2024-05-03 01:00
1 minute
aggregate from 2024-05-23 01:00
to 2024-05-23 01:01
1 second
aggregate from 2024-05-23 01:01:00
to 2024-05-23 01:01:01
.I've been prototyping something similar to this to compute integrals over arbitrary large time ranges efficiently.
Thanks, this looks promising, thought the additional disk space is needed for aggregates
I found some other strange things: I've created a new hypertable where min date is "2022-01-01 00:00:00.387116+03" and max date is "2023-12-31 23:59:58.739976+03". Why does it created a 3 chunks for hypertable
by_range('Dt', INTERVAL '1 year')
(I think there should be 2 chunks)? When I executing the query with conditionwhere "Dt" >= '2022-01-01 00:00:00' and "Dt" <= '2022-11-30 04:24:04'
explain show that it'll scan 2 chunks sequentially? Why does it happen with timescaleDB?
What does by_range('Dt', INTERVAL '1 year')
means? If I understand correctly there should be 2 chunks for 2022 and 2023 years, isn't it?
@Bykiev we don't support Calendar based partitioning yet. So, when the first value is added to a chunk it will have that value "X" as the starting value and "X + 1 year" as the end value.
Btw, it seems, things are working here as intended. Please feel free to close this issue in that case. Thanks.
What type of bug is this?
Performance issue
What subsystems and features are affected?
Query planner
What happened?
TimescaleDB always performs a sequential scan in chunks instead of index
TimescaleDB version affected
2.15.2
PostgreSQL version used
14.3
What operating system did you use?
Windows 10
What installation method did you use?
Other
What platform did you run on?
On prem/Self-hosted
Relevant log output and stack trace
No response
How can we reproduce the bug?