timescale / timescaledb

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
https://www.timescale.com/
Other
16.83k stars 852 forks source link

[Bug]: mergejoin input data is out of order #7097

Open jledentu opened 3 days ago

jledentu commented 3 days ago

What type of bug is this?

Unexpected error

What subsystems and features are affected?

Query executor

What happened?

Some SELECT requests that merge join CTEs generate such errors:

ERROR:  mergejoin input data is out of order 

SQL state: XX000

It happens when data is queried from hypertables using indexes. I tried to VACUUM/ANALYZE/REINDEX all tables, nothing helps.

TimescaleDB version affected

2.15.3

PostgreSQL version used

15.7

What operating system did you use?

Ubuntu 22.04

What installation method did you use?

Docker

What platform did you run on?

On prem/Self-hosted, Not applicable

Relevant log output and stack trace

No response

How can we reproduce the bug?

  1. Run docker run -p 5432:5432 -e POSTGRES_PASSWORD=password timescale/timescaledb-ha:pg15.7-ts2.15.3
  2. Connect to the database
  3. Execute the queries below:
CREATE TABLE test1 (
    date timestamp with time zone NOT NULL,
    quantity double precision,
    "isText" boolean
);

CREATE TABLE test2 (
    date timestamp with time zone NOT NULL,
    quantity double precision,
    "isText" boolean
);

CREATE INDEX IF NOT EXISTS test_idx ON test1 (date) WHERE "isText" IS NULL;
SELECT create_hypertable('test1', 'date', chunk_time_interval => interval 'P7D', create_default_indexes => false);

CREATE INDEX IF NOT EXISTS test2_idx ON test2 (date) WHERE "isText" IS NULL;
SELECT create_hypertable('test2', 'date', chunk_time_interval => interval 'P7D', create_default_indexes => false);

INSERT INTO test1(date, quantity)
SELECT date, round((random() * (100-3) + 3)::NUMERIC) AS quantity
FROM generate_series('2023-01-01T00:00:00+01:00', '2023-05-01T00:00:00+01:00', interval 'PT10M') AS t(date);

INSERT INTO test2(date, quantity)
SELECT date, round((random() * (100-3) + 3)::NUMERIC) AS quantity
FROM generate_series('2023-01-01T00:00:00+01:00', '2023-05-01T00:00:00+01:00', interval 'PT10M') AS t(date);

SET LOCAL TIME ZONE 'Europe/Paris';
WITH
"cte1" AS (SELECT date + interval 'P1Y' AS date, avg(quantity) AS quantity FROM test1 WHERE date >= '2024-03-31T00:00:00+01:00'::timestamp with time zone - interval 'P1Y' AND date < '2024-03-31T23:59:59+02:00'::timestamp with time zone + (- interval 'P1Y') AND "isText" IS NULL GROUP BY 1 ORDER BY 1 ASC),
"cte2" AS (SELECT date + interval 'P1Y' AS date, avg(quantity) AS quantity FROM test2 WHERE date >= '2024-03-31T00:00:00+01:00'::timestamp with time zone - interval 'P1Y' AND date < '2024-03-31T23:59:59+02:00'::timestamp with time zone + (- interval 'P1Y') AND "isText" IS NULL GROUP BY 1 ORDER BY 1 ASC)
SELECT date, cte1.quantity + cte2.quantity FROM cte1 FULL OUTER JOIN cte2 USING (date);