timescale / timescaledb

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
https://www.timescale.com/
Other
17.42k stars 872 forks source link

[Bug]: Can't compress chunk #7113

Open hisahin opened 2 months ago

hisahin commented 2 months ago

What type of bug is this?

Unexpected error

What subsystems and features are affected?

Compression

What happened?

I have a hypertable. I compress the chunks connected to it. But some of them don't compress. It gives memory error:

select compress_chunk('_timescaledb_internal._hyper_27_1508_chunk'); ERROR: invalid memory alloc request size 1367701419

It was able to compress some chunks. Some give this error. For example, it was also able to compress other chunks of 5GB, 30GB or 40GB.

My chunk size is 5GB.

Here are the columns of my table. column1 | uuid
column2 | uuid
column3 | timestamp(0) with time zone column4 | timestamp with time zone
column5 | bigint
column6 | text
column7 | log.request_method
column8 | inet
column9 | inet
column10 | jsonb
column11 | jsonb
column12 | double precision

I tried changing work_mem, maintenance_work_mem, shared_buffers. I tried changing timescaledb.compress_segmentby columns. But, not work.

TimescaleDB version affected

2.11.0

PostgreSQL version used

15.3

What operating system did you use?

Rocky Linux 8.8

What installation method did you use?

RPM

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

select compress_chunk('_timescaledb_internal._hyper_27_1508_chunk');
ERROR:  invalid memory alloc request size 1367701419

How can we reproduce the bug?

You need my data to simulate the error
svenklemm commented 1 month ago

2.11.0 is very old version, can you reproduce this on the most recent version?

hisahin commented 1 month ago

I tried upgrading the version to 2.15.3 and the error persists.

akuzm commented 1 month ago

Would be good if we could reproduce this to debug the problem, can you provide a small sample of data that shows this problem?

hisahin commented 1 month ago

You can simulate and try the situation with these steps:

create table hisahin(column1 serial, column2 jsonb, time timestamp with time zone default now());

select create_hypertable('hisahin','time');

DO $$ BEGIN FOR i IN 1..5000 LOOP INSERT INTO hisahin(column2) SELECT to_jsonb(repeat(substr(md5(random()::text), 1, 100), 600000)); END LOOP; END $$;

ALTER TABLE hisahin SET (timescaledb.compress=true);

\dt+ _timescaledb_internal._hyper_45_1652_chunk Size 1093 MB

select compress_chunk('_timescaledb_internal._hyper_45_1652_chunk'); ERROR: invalid memory alloc request size 1209600756

akuzm commented 1 month ago

Thanks for the script, I can reproduce the problem. This is unfortunately a current limitation of the compression. The compressed rows are organized in batches of 1000 rows, and then compressed together. The combined size can be no more that 1GB because this is a Postgres TOAST storage limitation. In your example, the size of the jsonb column for 1000 rows is much larger at 18 GB:

select pg_size_pretty(length(column2::text) * 1000::bigint) from hisahin limit 1;
 pg_size_pretty 
────────────────
 18 GB

The max batch size is currently set at 1000 rows and is not configurable. You could try using a segmentby column which would split the data into lesser batches, e.g. some column that has 50 rows for unique column value.

Another approach can be to put the large jsonb messages into a separate hypertable that is not compressed with TimescaleDB compression, only with the Postgres TOAST compression.

hisahin commented 1 month ago

I was hoping for a parameter like "set max_alloc_mem to '4GB' ". It doesn't seem like something I can overcome by setting the config.

fabriziomello commented 1 month ago

I was hoping for a parameter like "set max_alloc_mem to '4GB' ". It doesn't seem like something I can overcome by setting the config.

Unfortunately there's no such parameter and as @akuzm mentioned this is a current Postgres TOAST storage limitation.