reorg / pg_repack

Reorganize tables in PostgreSQL databases with minimal locks
BSD 3-Clause "New" or "Revised" License
1.91k stars 177 forks source link

Timeout when repacking a LARGE table #96

Closed northrup closed 3 years ago

northrup commented 8 years ago

I have a table that contains 25m rows, which I cannot trim. The repacking of this table timeouts with

ERROR: query failed: ERROR:  canceling statement due to statement timeout
DETAIL: query was: CREATE TABLE repack.table_16430 WITH (oids=false) TABLESPACE pg_default AS SELECT id,target_type,target_id,title,data,project_id,created_at,updated_at,action,author_id FROM ONLY events

ostensibly because pg_repack cannot select all 25m rows into the repack table under the timeout. The problem is I have set a timeout in the pg_repack command to a high value, and no matter what this transaction times out after 5 minutes of wall time. How can I successfully get this table repacked?

schmiddy commented 8 years ago

This sounds like you have statement_timeout set either globally or for whichever database/user you are running pg_repack with. Can you try using psql to log into the same database, using the same username as you used with pg_repack, and check what:

SHOW statement_timeout;

says?

northrup commented 8 years ago

@schmiddy here's what you were asking for:

gitlabhq_production=# SHOW statement_timeout;
 statement_timeout
-------------------
 0
(1 row)
northrup commented 8 years ago

@schmiddy further though in the code for the preliminary_checks function you specifically establish a statement_timeout of 0 : https://github.com/reorg/pg_repack/blob/master/bin/pg_repack.c#L508

Is there anywhere else I could look on this?

schmiddy commented 8 years ago

Err yeah this is a bit of a head-scratcher. The error message you posted:

ERROR: canceling statement due to statement timeout

can only be coming from Postgres here.

In theory we intend to be setting statement_timeout to 0 in preliminary_checks as you posted, but I wonder if maybe we confuse ourselves in lock_access_share, where we first set then reset statement_timeout (and same deal in lock_exclusive).

Either way, I think in those dubious RESETs that we'd be RESET'ing statement_timeout to whatever the default is for that connection/user/database/global setting -- you showed earlier that you think this should be set to 0 (disabled) by default, but perhaps that error was really hit with a different user or database? Or the unlucky user had some other way of turning on statement_timeout, e.g. coaxing libpq connection parameters like:

PGOPTIONS="-c statement_timeout=5min" pg_repack ...
danbst commented 7 years ago

I had no problems repacking tables up to 750M rows as of pg_repack 1.4.0 on Amazon RDS

bitglue commented 5 years ago

Is the problem that the -k option (as required to use on RDS) not only bypasses superuser checks, but essential setup such as disabling the statement timeout?

A largeer issue is the consequences of that -- I'm only just starting to understand how pg_repack works, but it some scenarios (a large table, with a high rate of updates), trying to select all the rows from one table an insert them into another in one transaction is bound to cause problems. Although it may avoid an exclusive lock as vacuum full would take, the transaction will also prevent vacuum from removing dead tuples, and the consequent accumulation of dead tuples may degrade performance until Bad Things happen.

shiwangini commented 4 years ago

Statement_timeout is a good thing before running any postgres DDL operations(From documentation:https://www.postgresql.org/docs/9.4/runtime-config-client.html).

Here, before locking the table(for drop/swap) - pg_repack waits for existing operations/transactions to finish on a table. That's the reason it sets a timeout here before lock table statement. If all the ongoing transactions doesn't finish on the table then you receive the message: canceling statement due to statement timeout

Next time it tries again by increasing statement timeout. Likewise , it keeps repeating the same process multiple times until it is successfully able to lock the table. Once, it lock the table- it performs swap.

If we don't have statement_timeout or statement_timeout=0 - we might endup with missing some updates/operations for the tables.