[Bug]: querying data in large compressed chunks blows up disk usage

msherman13 commented 3 months ago

What type of bug is this?

Performance issue

What subsystems and features are affected?

Compression

What happened?

Querying compressed chunks that are large causes disk usage to blow up. In normal circumstance, our 256GB volume is around 25% full. When we query on some compressed chunks, the disk usage gradually increases to around 90%.

Why is timescaledb using disk for this instead of doing streaming decompression in memory? Is this a configuration issue or bug?

TimescaleDB version affected

2.15.p

PostgreSQL version used

16.2

What operating system did you use?

Alpine 3.20 (or whatever the official docker image is based on)

What installation method did you use?

Docker

What platform did you run on?

Amazon Web Services (AWS)

Relevant log output and stack trace

No response

How can we reproduce the bug?

Create a hypertable with large compressed chunks (10GB decompressed, 1.5GB compressed) and query the data in compressed chunks.

fabriziomello commented 3 months ago

@msherman13 Postgres spill data to disk if there's no enough work_mem when doing sorting operations. What is the current value you're using? The default is very low 4MB.

To check if this is the problem you can connect to your Postgres and in the session set the following GUCs and then execute your query:

SET trace_sort TO on;
SET client_min_messages TO log;
SELECT ...  -- Your query that decompress data

msherman13 commented 3 months ago

@fabriziomello this is super helpful feedback, I’ll dig into this query to see if it was ordering the data

timescale / timescaledb