timescale / timescaledb

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
https://www.timescale.com/
Other
16.8k stars 852 forks source link

INSERT INTO hypertable using a PL/pgSQL function results in high RAM usage #7053

Closed Kazmirchuk closed 5 hours ago

Kazmirchuk commented 1 week ago

What type of bug is this?

Performance issue

What subsystems and features are affected?

Data ingestion, Query executor, Query planner

What happened?

A query of the form

INSERT INTO hypertable (...)
SELECT some_plpgsql_function(column)
FROM postgres_table

performs badly and results in a very visible spike in RAM usage that might end up in Linux OOM killing Postgres.

The bug does not manifest, if:

htop screenshots when my system is idle:

idle

and when the query is running:

running_query

TimescaleDB version affected

2.15.2

PostgreSQL version used

14.8

What operating system did you use?

Red Hat 8.7

What installation method did you use?

RPM

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

No response

How can we reproduce the bug?

1. Create a usual Postgres table:
CREATE TABLE import_param
(
    pk serial PRIMARY KEY,
    obt timestamptz NOT NULL,  -- obt=onboard time
    device_id int,
    raw_value text,
    eng_value text
)

2. Populate it with random values, 700K rows, so that roughly 50% of raw_value and eng_value are valid float8 (as text) and others are just text strings like 'foobar'.

3. Create a hypertable:
CREATE TABLE tm_param
(
    obt timestamptz NOT NULL,
    device_id int,
    raw_value float8,
    eng_value float8
);
SELECT * FROM create_hypertable('tm_param', 'obt', 
chunk_time_interval => INTERVAL '15m', 
create_default_indexes => FALSE);
Adjust the interval, so that importing data from import_param will result in a reasonable number of chunks - I had 4 in my tests.

4. Create a function to convert text to float8, because PostgreSQL doesn't have try_cast:
CREATE OR REPLACE FUNCTION to_double(_in text, INOUT _out double precision)
    LANGUAGE plpgsql
    IMMUTABLE
    parallel safe
AS $$
BEGIN
   SELECT CAST(_in AS double precision)
   INTO  _out;
EXCEPTION WHEN others THEN
   -- do nothing: _out already carries default
END;
$$;

5. Copy all data from import_param to tm_param:

INSERT INTO tm_param (obt, device_id, raw_value, eng_value)
SELECT obt, device_id,
to_double(raw_value), 
to_double(eng_value)
from imp_param;

6. Monitor RAM consumption while the query is running.
Kazmirchuk commented 1 week ago

btw I'm not talking about huge amount of data here:

SELECT pg_size_pretty(table_bytes) FROM hypertable_detailed_size('tm_param');

=> 39 MB

Kazmirchuk commented 1 week ago

I have managed to solve my specific problem by rewriting my function to pure SQL, but the general question of using PL/pgSQL in such context remains. Unfortunately, adding "parallel safe" made no difference.

nikkhils commented 1 week ago

@Kazmirchuk not sure if this indicates a real problem. The Linux OS does memory cleanup as and when needed.

Kazmirchuk commented 1 week ago

@Kazmirchuk not sure if this indicates a real problem. The Linux OS does memory cleanup as and when needed.

PostgreSQL crashed every time. IMHO this is a real problem that does not occur when using standard PostgreSQL tables.

nikkhils commented 1 day ago

@Kazmirchuk thanks for the repro steps. I can confirm that there's a memory leak when plpgsql functions are involved in the INSERT queries. We will continue to investigate this.

nikkhils commented 1 day ago

When I dump the memory context I can see multiple entries for CurTransactionContext: which is weird. And there's an additional entry every time the plpgsql function gets invoked for each tuple. So, basically a 8KB leak everytime which can quickly add up when a lot of function calls are involved.

TopTransactionContext: 8192 total in 1 blocks; 6944 free (3 chunks); 1248 used
  CurTransactionContext: 8192 total in 1 blocks; 7928 free (1 chunks); 264 used
  SPI Exec: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
  SPI Proc: 8192 total in 1 blocks; 7216 free (0 chunks); 976 used
    PLpgSQL per-statement data: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
  CurTransactionContext: 8192 total in 1 blocks; 7928 free (1 chunks); 264 used
  CurTransactionContext: 8192 total in 1 blocks; 7928 free (1 chunks); 264 used
  ExecutorState: 8192 total in 1 blocks; 7392 free (3 chunks); 800 used
Grand total: 65536 bytes in 8 blocks; 61192 free (9 chunks); 4344 used
nikkhils commented 1 day ago

this happens in SPI_finish when we delete the SPI Proc memory context. It has next and prevchild entries set up and then this CurTransactionContext context which is there in the prevchild gets chained into the TopTransactionContext leading to the leak.

Kazmirchuk commented 2 hours ago

many thanks for the quick fix! 👍