timescale / timescaledb

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
https://www.timescale.com/
Other
17.84k stars 884 forks source link

[Bug]: Extreme planning time when WHERE clause contains subquery and FALSE condition #6095

Open raymalt opened 1 year ago

raymalt commented 1 year ago

What type of bug is this?

Performance issue

What subsystems and features are affected?

Query planner

What happened?

This blog post contains solutions for combining several continuous aggregates in a visualization, where the selection is made based on the time resolution of the plotting tool (e.g. Grafana). This means that only one query should return anything, and the others should have a FALSE condition - causing them to return nothing.

It seems that combining this with a condition based on a subquery in the WHERE clause causes the query planning to take a very long time - a time which is probably dependent on the number of chunks in the hypertable.

A stripped-down query is shown below, the first one had 0.13 ms planning time for me and the second one had 480 ms. The only difference is that the device_id is filtered from list of values vs. a SELECT statement. The hypertable has 4000+ chunks. Another hypertable with 800+ had the same issue, but the planning time went down to about 50 ms instead - still a 400x increase in planning time.

N.B.: While the below examples might not seem very useful and practical, they are actually taken (and stripped down) from a real use case, where the UNION ALL method is combined with a dynamic query based on a user setting.

Note that the queries are intentionally written in a way so that they return zero rows (time > now()).

The outputs from EXPLAIN (ANALYZE, VERBOSE, BUFFERS are included in the log output section below. I assume that Buffers: shared hit=<very large number> is relevant.

Query with 0.16 ms planning time

SELECT
    time,
    device_id,
    float_value
FROM
    metrics m
WHERE 
    time > now()
    AND device_id IN (4)
    AND FALSE

Query with 577 ms planning time

SELECT
    time,
    device_id,
    float_value
FROM
    metrics m
WHERE 
    time > now()
    AND device_id IN (SELECT 4)
    AND FALSE

TimescaleDB version affected

2.11.2

PostgreSQL version used

15.4

What operating system did you use?

Timescale Cloud

What installation method did you use?

Not applicable

What platform did you run on?

Timescale Cloud

Relevant log output and stack trace

__Query 1__

Result  (cost=0.00..0.00 rows=0 width=0) (actual time=0.001..0.002 rows=0 loops=1)
  Output: "time", device_id, float_value
  One-Time Filter: false
Query Identifier: -5475597576986489885
Planning:
  Buffers: shared hit=4
Planning Time: 0.129 ms
Execution Time: 0.042 ms

Query 2

Result  (cost=0.00..0.00 rows=0 width=0) (actual time=0.002..0.002 rows=0 loops=1)
  Output: m."time", m.device_id, m.float_value
  One-Time Filter: false
Query Identifier: -5361642898173797665
Planning:
  Buffers: shared hit=105052
Planning Time: 481.548 ms
Execution Time: 0.130 ms

### How can we reproduce the bug?

```bash
Create a hypertable with a lot of chunks and then run the examples above.
jnidzwetzki commented 1 year ago

Hello @raymalt,

Thanks for letting us know about this issue. I was able to reproduce it on my local system with TimescaleDB 2.12.0-dev:

Prepare Environment

CREATE TABLE sensor_data(
time timestamp NOT NULL,
sensor_id integer NOT NULL,
cpu double precision,
temperature double precision);

SELECT FROM create_hypertable(relation=>'sensor_data', time_column_name=> 'time');

INSERT INTO sensor_data
SELECT time + (INTERVAL '1 minute' * random()) AS time,
sensor_id, 
random() AS cpu, 
random()* 100 AS temperature
FROM
generate_series('1990-01-01'::date, '2000-01-01', INTERVAL '1 week') AS g1(time),
generate_series(1, 100, 1 ) AS g2(sensor_id)
ORDER BY time;

Query 1 (constant values)

explain (verbose, analyze) SELECT
    time,
    sensor_id,
    temperature
FROM
    sensor_data
WHERE 
    time > now()
    AND sensor_id IN (4)
    AND FALSE;

                                     QUERY PLAN                                     
------------------------------------------------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=0) (actual time=0.004..0.005 rows=0 loops=1)
   Output: "time", sensor_id, temperature
   One-Time Filter: false
 Planning Time: 1.307 ms
 Execution Time: 0.038 ms
(5 rows)

Query 2 (subquery)

explain (verbose, analyze) SELECT
    time,
    sensor_id,
    temperature
FROM
    sensor_data
WHERE 
    time > now()
    AND sensor_id IN (SELECT 4)
    AND FALSE;

                                     QUERY PLAN                                     
------------------------------------------------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=0) (actual time=0.004..0.005 rows=0 loops=1)
   Output: sensor_data."time", sensor_data.sensor_id, sensor_data.temperature
   One-Time Filter: false
 Planning Time: 105.857 ms
 Execution Time: 0.051 ms
(5 rows)

So, query 2 needs around 104 ms more planning time.

Flamegraphs

Query 1

old

Query 2

new

Difference

diff

Analysis

The hypertable expansion function (ts_plan_expand_hypertable_chunks) appears to be causing longer planning times. This function is not called for query 1 (which contains a const), but for query 2 (which includes a subquery). It is called once and takes approximately 101 ms to execute.

# Query 1
$ sudo funclatency-bpfcc /home/jan/postgresql-sandbox/bin/REL_14_9_RELEASE/lib/timescaledb-2.12.0-dev.so:ts_plan_expand_hypertable_chunks
[...]

# Query 2
$ sudo funclatency-bpfcc /home/jan/postgresql-sandbox/bin/REL_14_9_RELEASE/lib/timescaledb-2.12.0-dev.so:ts_plan_expand_hypertable_chunks
[...]
Function = b'ts_plan_expand_hypertable_chunks' [3333927]
     nsecs               : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 0        |                                        |
         8 -> 15         : 0        |                                        |
        16 -> 31         : 0        |                                        |
        32 -> 63         : 0        |                                        |
        64 -> 127        : 0        |                                        |
       128 -> 255        : 0        |                                        |
       256 -> 511        : 0        |                                        |
       512 -> 1023       : 0        |                                        |
      1024 -> 2047       : 0        |                                        |
      2048 -> 4095       : 0        |                                        |
      4096 -> 8191       : 0        |                                        |
      8192 -> 16383      : 0        |                                        |
     16384 -> 32767      : 0        |                                        |
     32768 -> 65535      : 0        |                                        |
     65536 -> 131071     : 0        |                                        |
    131072 -> 262143     : 0        |                                        |
    262144 -> 524287     : 0        |                                        |
    524288 -> 1048575    : 0        |                                        |
   1048576 -> 2097151    : 0        |                                        |
   2097152 -> 4194303    : 0        |                                        |
   4194304 -> 8388607    : 0        |                                        |
   8388608 -> 16777215   : 0        |                                        |
  16777216 -> 33554431   : 0        |                                        |
  33554432 -> 67108863   : 0        |                                        |
  67108864 -> 134217727  : 1        |****************************************|

avg = 101887997 nsecs, total: 101887997 nsecs, count: 1