timescale / timescaledb

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
https://www.timescale.com/
Other
16.8k stars 852 forks source link

[Bug]: Some data not returned in queries when using partition functions in second dimension #7079

Open GreenArchon opened 3 days ago

GreenArchon commented 3 days ago

What type of bug is this?

Incorrect result

What subsystems and features are affected?

Query planner

What happened?

tl;dr: When using partition functions to create dimensions, timescale fails to return some data (ie. fails to see that e.g. 201 > 200 in the reproducer).


I'm testing the migration of an existing postgres table (which I can't easily change the schema of). This table contains data from multiple systems, which themselves contain multiple units. The table doesn't contain a column for systems, however it can be derived from the unit id as it is prefixed by it (e.g unit id 101 is in system 1 and unit id 201 is in system 2, or generally int(unitid / 100) == systemid). As this data is always queried one system at a time, I'd wish to partition the hypertable by time and system.

Using a partition function to separate my chunks by system works, but gives interesting results when searching by unit id.

From a naive perspective on the problem, it looks like the query planner, when looking for which chunk to scan on, replaces the query's unitid > 200 by unitid_dimension(unitid) > unitid_dimension(200)? Which of course, gives different results than e.g a plain postgres table would have. I did not see this documented as a "feature", although I might have missed it.

Thanks! And if you happen to see an alternative way to partition this table that would yield similar chunking but also correct results, feel free to let me know!

TimescaleDB version affected

2.14.2

PostgreSQL version used

15

What operating system did you use?

Debian 12

What installation method did you use?

Source

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

No response

How can we reproduce the bug?

CREATE DATABASE timescale_greater_bug;
\connect timescale_greater_bug;

CREATE TABLE telemetrydata (unitid integer NOT NULL, value real NOT NULL, timestamp_acq TIMESTAMP WITH TIME ZONE NOT NULL);

CREATE EXTENSION IF NOT EXISTS timescaledb;
SELECT create_hypertable('telemetrydata', by_range('timestamp_acq', INTERVAL '1 day'));
CREATE FUNCTION unitid_dimension(integer)                                                                                                                                                                                            
  RETURNS integer
  LANGUAGE SQL
  IMMUTABLE AS
  'SELECT $1 / 100';
SELECT add_dimension('telemetrydata', by_range('unitid', 1, partition_func => 'unitid_dimension'));

INSERT INTO telemetrydata (unitid, value, timestamp_acq) VALUES
    (101, 3.1416, CURRENT_TIMESTAMP),
    (201, 42, CURRENT_TIMESTAMP + interval '30 seconds'); 

SELECT * from telemetrydata WHERE unitid > 200;
 unitid | value | timestamp_acq 
--------+-------+---------------
(0 rows)

SELECT * from telemetrydata WHERE unitid > 100;
 unitid | value |         timestamp_acq         
--------+-------+-------------------------------
    201 |    42 | 2024-06-28 17:42:18.103536-04
(1 row)

SELECT * from telemetrydata WHERE unitid > 99;
 unitid | value  |         timestamp_acq         
--------+--------+-------------------------------
    101 | 3.1416 | 2024-06-28 17:41:48.103536-04
    201 |     42 | 2024-06-28 17:42:18.103536-04
(2 rows)

SELECT * from telemetrydata WHERE unitid >= 200;
 unitid | value |         timestamp_acq         
--------+-------+-------------------------------
    201 |    42 | 2024-06-28 17:42:18.103536-04
(1 row)

SELECT * from telemetrydata WHERE unitid >= 100;
 unitid | value  |         timestamp_acq         
--------+--------+-------------------------------
    101 | 3.1416 | 2024-06-28 17:41:48.103536-04
    201 |     42 | 2024-06-28 17:42:18.103536-04
(2 rows)

SELECT * from telemetrydata WHERE unitid < 202;
 unitid | value  |         timestamp_acq         
--------+--------+-------------------------------
    101 | 3.1416 | 2024-06-28 17:41:48.103536-04
(1 row)

SELECT * from telemetrydata WHERE unitid BETWEEN 199 AND 202;
 unitid | value |         timestamp_acq         
--------+-------+-------------------------------
    201 |    42 | 2024-06-28 17:42:18.103536-04
(1 row)

EXPLAIN SELECT * from telemetrydata WHERE unitid > 200;
                QUERY PLAN                
------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=0)
   One-Time Filter: false
(2 rows)

EXPLAIN SELECT * from telemetrydata WHERE unitid > 100;
                             QUERY PLAN                             
--------------------------------------------------------------------
 Seq Scan on _hyper_1_2_chunk  (cost=0.00..33.12 rows=617 width=16)
   Filter: (unitid > 100)
(2 rows)