Open BloodWorkXGaming opened 8 months ago
Hello @BloodWorkXGaming,
Thanks for reaching out. I was able to reproduce the behavior in my local environment with TimescaleDB 2.13.0. The current version of time_bucket_gapfill
has a few limitations. It seems you ran into one of them with the given query. This is the query plan for the statement:
test2=# explain select * from (
select time_bucket_gapfill('01:00:00'::interval, time, 'Europe/Berlin', '2023-10-10 06:00:00+00'::timestamptz, '2023-10-10 10:00:00+00'::timestamptz) AS tb,
count(value) as count
from (values
('2023-10-10 06:00:00+00'::timestamptz, 1), ('2023-10-10 06:01:00+00', 2)
) as t(time, value)
group by tb) t
where count > 0;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------
Custom Scan (GapFill) (cost=0.04..0.08 rows=1 width=0)
-> GroupAggregate (cost=0.04..0.08 rows=1 width=0)
Group Key: (time_bucket_gapfill('01:00:00'::interval, "*VALUES*".column1, 'Europe/Berlin'::text, '2023-10-10 08:00:00+02'::timestamp
with time zone, '2023-10-10 12:00:00+02'::timestamp with time zone))
Filter: (count("*VALUES*".column2) > 0)
-> Sort (cost=0.04..0.05 rows=2 width=12)
Sort Key: (time_bucket_gapfill('01:00:00'::interval, "*VALUES*".column1, 'Europe/Berlin'::text, '2023-10-10 08:00:00+02'::time
stamp with time zone, '2023-10-10 12:00:00+02'::timestamp with time zone))
-> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2 width=12)
(7 rows)
Even if you specify a subquery, PostgreSQL optimizes it before execution and removes the subquery. As you can see in the query plan, the filter for where count > 0
is pushed down by PostgreSQL below the GapFill
node. Therefore, it is applied before time_bucket_gapfill
is executed.
As an alternative to the subquery, you can use a Common Table Expression (CTE):
explain WITH cte AS (
select time_bucket_gapfill('01:00:00'::interval, time, 'Europe/Berlin', '2023-10-10 06:00:00+00'::timestamptz, '2023-10-10 10:00:00+00'::timestamptz) AS tb,
count(value) as count
from (values
('2023-10-10 06:00:00+00'::timestamptz, 1), ('2023-10-10 06:01:00+00', 2)
) as t(time, value)
group by tb)
SELECT * FROM cte where count > 0;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
CTE Scan on cte (cost=0.08..0.12 rows=1 width=16)
Filter: (count > 0)
CTE cte
-> Custom Scan (GapFill) (cost=0.04..0.08 rows=2 width=0)
-> GroupAggregate (cost=0.04..0.08 rows=2 width=0)
Group Key: (time_bucket_gapfill('01:00:00'::interval, "*VALUES*".column1, 'Europe/Berlin'::text, '2023-10-10 08:00:00+02'::t
imestamp with time zone, '2023-10-10 12:00:00+02'::timestamp with time zone))
-> Sort (cost=0.04..0.05 rows=2 width=12)
Sort Key: (time_bucket_gapfill('01:00:00'::interval, "*VALUES*".column1, 'Europe/Berlin'::text, '2023-10-10 08:00:00+0
2'::timestamp with time zone, '2023-10-10 12:00:00+02'::timestamp with time zone))
-> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2 width=12)
(9 rows)
test2=# WITH cte AS (
select time_bucket_gapfill('01:00:00'::interval, time, 'Europe/Berlin', '2023-10-10 06:00:00+00'::timestamptz, '2023-10-10 10:00:00+00'::timestamptz) AS tb,
count(value) as count
from (values
('2023-10-10 06:00:00+00'::timestamptz, 1), ('2023-10-10 06:01:00+00', 2)
) as t(time, value)
group by tb)
SELECT * FROM cte where count > 0;
tb | count
------------------------+-------
2023-10-10 08:00:00+02 | 2
(1 row)
As you can see in this query plan, the Filter: (count > 0)
operation is now performed after GapFill
is executed and the query output is filtered as expected.
Hi @jnidzwetzki Thanks for the quick answer! :)
I can confirm, that this workaround works perfectly fine, thanks for that :) I'll leave it up to you if you want to keep this issue open for further investigation or if you want to close it as a 'fix' is found.
What type of bug is this?
Incorrect result
What subsystems and features are affected?
Gapfill
What happened?
The WHERE clause on a count(*) on the group-size of a gapfill seems to be ignored. I hope I am not doing something very wrong here:
Minimal example:
Null values are still present. There are no changes to the result when using
is not null
or usingHAVING
. neither does acount = 2
workAny ideas what could cause this behavior?
Thanks!
TimescaleDB version affected
2.10.2
PostgreSQL version used
14.7
What operating system did you use?
WSL
What installation method did you use?
Docker
What platform did you run on?
On prem/Self-hosted
Relevant log output and stack trace
How can we reproduce the bug?