timescale / timescaledb

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
https://www.timescale.com/
Other
17.48k stars 876 forks source link

[Bug]: Out of shared memory when combining join with always false condition #4758

Open julianbieber opened 1 year ago

julianbieber commented 1 year ago

What type of bug is this?

Unexpected error

What subsystems and features are affected?

Query planner

What happened?

Hi,

We have observed an out of shared memory error for a query that should only access a few chunks according to the time range passed to the query.

The issue only occurs when combining a join with a where clause that is trivially false.

The error seems to occur in the planning stage, as it is also present when explaining the query (not explain analyze).

There is a workaround: Including the time range in the on clause from the join reduces the number of locks being acquired.

We ran into the issue because our ORM generated the "AND false" condition as a replacement for an IN clause on an empty sequence.

TimescaleDB version affected

2.8.0, 2.3.1

PostgreSQL version used

14.5, 13.3

What operating system did you use?

Alpine 11.2.1 x64

What installation method did you use?

Docker

What platform did you run on?

Amazon Web Services (AWS)

Relevant log output and stack trace

SQL Error [53200]: ERROR: out of shared memory
  Hint: You might need to increase max_locks_per_transaction.

How can we reproduce the bug?

1. start the database with a small value for max_locks_per_transaction (to make the reproduction easier)
The total number of locks allowed should be 1000 = 10 (max_locks_per_transaction * 100 (max_connections)

docker  run -p 5431:5432 -e POSTGRES_PASSWORD=password timescale/timescaledb:latest-pg14 -c max_locks_per_transaction=10
  1. create a timescaledb table and a regular table to join on

create table timeseries (
   t timestamp with time zone not null
);

create table other_table ();

SELECT public.create_hypertable('public.timeseries', 't', chunk_time_interval => interval '1 day');
  1. populate the time series table with more than 1000 chunks and therefore more than the maximum number of locks allowed.
insert into timeseries (t) select generate_series ('2020-01-01 00:00:00', '2021-01-01 00:00:00', INTERVAL '1 DAY');
insert into timeseries (t) select generate_series ('2021-01-01 00:00:00', '2022-01-01 00:00:00', INTERVAL '1 DAY');
insert into timeseries (t) select generate_series ('2022-01-01 00:00:00', '2023-01-01 00:00:00', INTERVAL '1 DAY');
insert into timeseries (t) select generate_series ('2023-01-01 00:00:00', '2024-01-01 00:00:00', INTERVAL '1 DAY');
insert into timeseries (t) select generate_series ('2024-01-01 00:00:00', '2025-01-01 00:00:00', INTERVAL '1 DAY');
  1. plan the failing query
explain 
select * from timeseries join other_table on true  where t >= '2020-01-01 00:00:00' and t <= '2020-01-02 00:00:00' and false;
  1. Here are some example queries on the same dataset that do not avoid the error.
-- this query succeeds
explain 
select * from timeseries join other_table on true where t >= '2020-01-01 00:00:00' and t <= '2020-01-02 00:00:00';
-- this query succeeds
explain 
select * from timeseries where t >= '2020-01-01 00:00:00' and t <= '2020-01-01 00:00:00' and false;
-- this query succeeds
explain 
select * from timeseries join other_table on t >= '2020-01-01 00:00:00' and t <= '2020-01-02 00:00:00'  where t >= '2020-01-01 00:00:00' and t <= '2020-01-02 00:00:00' and false;
RafiaSabih commented 1 year ago

@julianbieber Thanks for reporting this, we are able to reproduce this successfully with the above mentioned code.

akuzm commented 1 year ago

Some technical details: it seems to be a general limitation of Postgres. If we look at distribute_qual_to_rels, it tries to evaluate variable-free pseudoconstant clauses as high as possible in the join tree. So this false clause ends up in RelOptInfo->joinrestrictinfo, not baserestrictinfo, this is why we can't mark the hypertable relation as dummy and instead try to expand all the partitions. Maybe we could add some special handling for such clauses, if this locking problem is a frequent ocurrence.