timescale / timescaledb

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
https://www.timescale.com/
Other
16.83k stars 852 forks source link

compress_chunk() blocks other queries on the table for a long time #2732

Open phemmer opened 3 years ago

phemmer commented 3 years ago

Relevant system information:

Describe the bug When compressing a chunk with compress_chunk(), after a certain point in the process, any queries with chunks_detailed_size() will block (possibly other metadata information queries as well, not sure).

To Reproduce Steps to reproduce the behavior:

  1. Have a multi-node cluster (not sure if necessary, but that's what I'm using)
  2. On access node: select compress_chunk('mychunk');
  3. On access node: While running, periodically do select * from chunks_detailed_size('mytable')

Expected behavior Successful response in a short amount of time.

Actual behavior On the data node:

Dec 10 03:48:49 ded4077 postgres[29867]: [38-1] pid=29867 db=edgestats user=postgres rhost=1.1.1.70 app=timescaledb tid=4/3050 sid=5fcfdb10.74ab LOG:  process 29867 still waiting for AccessShareLock on relation 3424460 of database 16386 after 1000.110 ms
Dec 10 03:48:49 ded4077 postgres[29867]: [38-2] pid=29867 db=edgestats user=postgres rhost=1.1.1.70 app=timescaledb tid=4/3050 sid=5fcfdb10.74ab DETAIL:  Process holding the lock: 14771. Wait queue: 29867.
Dec 10 03:48:49 ded4077 postgres[29867]: [38-3] pid=29867 db=edgestats user=postgres rhost=1.1.1.70 app=timescaledb tid=4/3050 sid=5fcfdb10.74ab CONTEXT:  PL/pgSQL function _timescaledb_internal.chunks_local_size(name,name) line 3 at RETURN QUERY
Dec 10 03:48:49 ded4077 postgres[29867]: [38-4] pid=29867 db=edgestats user=postgres rhost=1.1.1.70 app=timescaledb tid=4/3050 sid=5fcfdb10.74ab STATEMENT:  SELECT * from _timescaledb_internal.chunks_local_size( 'public', 'mytable'  );

...eventually timing out due to statement_timeout (in my case, 5 minutes).

Additional context While I doubt it would be possible to not block at all, I think the blocking time should be reduced to a few seconds at most.

phemmer commented 3 years ago

Ok, so this issue is actually much worse than originally described, the blocking also affects normal read queries on the table.

Looks like #2669 might be related to this. Though that one is for decompress, not compress.

ebreijo commented 2 years ago

I'm also having this issue when using the backfill script on large data set. The compress/decompress is locking read queries on the hypertable that contains the compressed chunks.

Currently running timescale 2.4.2 on Postgres 13.

Is there a way to prevent this lock on read queries somehow?

akaashp commented 3 months ago

having this issue when my policy defined compression job runs. Blocks all writes to the table until the job is done even though an old chunk that is not being written to is being compressed. Makes compression unusable for high throughput...

sseveran commented 1 month ago

I am experiencing the same thing. How is compression deployed in the real world? Is concurrently compressing a chunk and being able to read from the table a paid feature?

phemmer commented 1 week ago

At this point, I might recommend Timescale change their official recommendations, and the default (7 days), on the size of chunks. A few reasons for this.

  1. This issue. Bigger chunks mean longer times where all operations are frozen waiting for the chunk to compress. On high volume inserts, if the ingestion is blocked long enough, it can cause the data submitters to start timing out and dropping data.
  2. The reasoning against smaller chunks is that it results in more planning time, but does does not actually seem significant. I keep 45 days of data, and typically use 4h chunks, and planning time is insignificant.
  3. With the option compress_chunk_time_interval, multiple smaller chunks can be rolled into larger ones.
  4. It's easy to migrate from small chunks to large ones. But impossible to migrate from large chunks to smaller ones.
  5. With issues like #7068, large uncompressed chunks cause a major performance hit to queries.

So it seems like the recommended/default chunk size was established back when Timescale was new, and has not kept up with the state of things.

And honestly for me, due to all of the above, I'm considering shrinking our chunks down to 15 minutes.