tl;dr: When using partition functions to create dimensions, timescale fails to return some data (ie. fails to see that e.g. 201 > 200 in the reproducer).
I'm testing the migration of an existing postgres table (which I can't easily change the schema of). This table contains data from multiple systems, which themselves contain multiple units. The table doesn't contain a column for systems, however it can be derived from the unit id as it is prefixed by it (e.g unit id 101 is in system 1 and unit id 201 is in system 2, or generally int(unitid / 100) == systemid). As this data is always queried one system at a time, I'd wish to partition the hypertable by time and system.
Using a partition function to separate my chunks by system works, but gives interesting results when searching by unit id.
From a naive perspective on the problem, it looks like the query planner, when looking for which chunk to scan on, replaces the query's unitid > 200 by unitid_dimension(unitid) > unitid_dimension(200)? Which of course, gives different results than e.g a plain postgres table would have. I did not see this documented as a "feature", although I might have missed it.
Thanks! And if you happen to see an alternative way to partition this table that would yield similar chunking but also correct results, feel free to let me know!
TimescaleDB version affected
2.14.2
PostgreSQL version used
15
What operating system did you use?
Debian 12
What installation method did you use?
Source
What platform did you run on?
On prem/Self-hosted
Relevant log output and stack trace
No response
How can we reproduce the bug?
CREATE DATABASE timescale_greater_bug;
\connect timescale_greater_bug;
CREATE TABLE telemetrydata (unitid integer NOT NULL, value real NOT NULL, timestamp_acq TIMESTAMP WITH TIME ZONE NOT NULL);
CREATE EXTENSION IF NOT EXISTS timescaledb;
SELECT create_hypertable('telemetrydata', by_range('timestamp_acq', INTERVAL '1 day'));
CREATE FUNCTION unitid_dimension(integer)
RETURNS integer
LANGUAGE SQL
IMMUTABLE AS
'SELECT $1 / 100';
SELECT add_dimension('telemetrydata', by_range('unitid', 1, partition_func => 'unitid_dimension'));
INSERT INTO telemetrydata (unitid, value, timestamp_acq) VALUES
(101, 3.1416, CURRENT_TIMESTAMP),
(201, 42, CURRENT_TIMESTAMP + interval '30 seconds');
SELECT * from telemetrydata WHERE unitid > 200;
unitid | value | timestamp_acq
--------+-------+---------------
(0 rows)
SELECT * from telemetrydata WHERE unitid > 100;
unitid | value | timestamp_acq
--------+-------+-------------------------------
201 | 42 | 2024-06-28 17:42:18.103536-04
(1 row)
SELECT * from telemetrydata WHERE unitid > 99;
unitid | value | timestamp_acq
--------+--------+-------------------------------
101 | 3.1416 | 2024-06-28 17:41:48.103536-04
201 | 42 | 2024-06-28 17:42:18.103536-04
(2 rows)
SELECT * from telemetrydata WHERE unitid >= 200;
unitid | value | timestamp_acq
--------+-------+-------------------------------
201 | 42 | 2024-06-28 17:42:18.103536-04
(1 row)
SELECT * from telemetrydata WHERE unitid >= 100;
unitid | value | timestamp_acq
--------+--------+-------------------------------
101 | 3.1416 | 2024-06-28 17:41:48.103536-04
201 | 42 | 2024-06-28 17:42:18.103536-04
(2 rows)
SELECT * from telemetrydata WHERE unitid < 202;
unitid | value | timestamp_acq
--------+--------+-------------------------------
101 | 3.1416 | 2024-06-28 17:41:48.103536-04
(1 row)
SELECT * from telemetrydata WHERE unitid BETWEEN 199 AND 202;
unitid | value | timestamp_acq
--------+-------+-------------------------------
201 | 42 | 2024-06-28 17:42:18.103536-04
(1 row)
EXPLAIN SELECT * from telemetrydata WHERE unitid > 200;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.00 rows=0 width=0)
One-Time Filter: false
(2 rows)
EXPLAIN SELECT * from telemetrydata WHERE unitid > 100;
QUERY PLAN
--------------------------------------------------------------------
Seq Scan on _hyper_1_2_chunk (cost=0.00..33.12 rows=617 width=16)
Filter: (unitid > 100)
(2 rows)
What type of bug is this?
Incorrect result
What subsystems and features are affected?
Query planner
What happened?
tl;dr: When using partition functions to create dimensions, timescale fails to return some data (ie. fails to see that e.g. 201 > 200 in the reproducer).
I'm testing the migration of an existing postgres table (which I can't easily change the schema of). This table contains data from multiple systems, which themselves contain multiple units. The table doesn't contain a column for systems, however it can be derived from the unit id as it is prefixed by it (e.g unit id 101 is in system 1 and unit id 201 is in system 2, or generally
int(unitid / 100) == systemid
). As this data is always queried one system at a time, I'd wish to partition the hypertable by time and system.Using a partition function to separate my chunks by system works, but gives interesting results when searching by unit id.
From a naive perspective on the problem, it looks like the query planner, when looking for which chunk to scan on, replaces the query's
unitid > 200
byunitid_dimension(unitid) > unitid_dimension(200)
? Which of course, gives different results than e.g a plain postgres table would have. I did not see this documented as a "feature", although I might have missed it.Thanks! And if you happen to see an alternative way to partition this table that would yield similar chunking but also correct results, feel free to let me know!
TimescaleDB version affected
2.14.2
PostgreSQL version used
15
What operating system did you use?
Debian 12
What installation method did you use?
Source
What platform did you run on?
On prem/Self-hosted
Relevant log output and stack trace
No response
How can we reproduce the bug?