reorg / pg_repack

Reorganize tables in PostgreSQL databases with minimal locks
BSD 3-Clause "New" or "Revised" License
1.85k stars 170 forks source link

Notices in pg_repack1.4.5 logs for AWS Aurora PostgreSQL 12.8 indicate waiting for transactions to finish with specific PIDs unrelated to the table being repacked. #381

Open Cdkhanna opened 6 months ago

Cdkhanna commented 6 months ago

Hi, In the pg_repack1.4.5 logs for AWS Aurora PostgreSQL 12.8, there are notices indicating a wait for transactions to finish, mentioning specific PIDs that are not associated with the table undergoing repacking. This inconsistency needs to be addressed for clarity and efficiency in the repacking process.

Cdkhanna commented 6 months ago

In the pg_stat_activity, it's observed that the pg_Repack session is executing the query "LOCK TABLE table IN ACCESS SHARE MODE" and remains in an idle in transaction state.

za-arthur commented 6 months ago

This is expected behavior. pg_repack opens an additional connection which holds ACCESS SHARE lock. Here is quote from the documentation:

You will not be able to perform DDL commands of the target table(s) except VACUUM or ANALYZE while pg_repack is working. pg_repack will hold an ACCESS SHARE lock on the target table during a full-table repack, to enforce this restriction.

and

pg_repack will only hold an ACCESS EXCLUSIVE lock for a short period during initial setup (steps 1 and 2 above) and during the final swap-and-drop phase (steps 6 and 7). For the rest of its time, pg_repack only needs to hold an ACCESS SHARE lock on the original table, meaning INSERTs, UPDATEs, and DELETEs may proceed as usual.

andreasscherbaum commented 6 months ago

This inconsistency needs to be addressed for clarity and efficiency in the repacking process.

Which tables are seen in the log?

Cdkhanna commented 6 months ago

Thanks for the response, @andreasscherbaum & @za-arthur

It seems I didn't explain the situation thoroughly. The PID that pg_repack is waiting for isn't interacting with the target table at all. Each time I run pg_repack, it gets stuck waiting for a completely unrelated session, In this particular case session is executing a different SQL statement on a different table.

This target table is newly created solely to reproduce the issue and isn't used anywhere else in the database.

Regarding my comment on the pg_repack session in pg_stat_Activity, it's executing "LOCK TABLE table IN ACCESS SHARE MODE" and remains idle in a transaction state.

andreasscherbaum commented 6 months ago

Can you create a case where you can reproduce this on vanilla PostgreSQL?

Cdkhanna commented 6 months ago

Quoting @schmiddy from another issue - Issue

So that "Waiting for x transactions to finish" message comes from here, and as you can see there is no respect paid to the -T / --wait-timeout option there.

I think the code has been like that forever, and I guess the reasoning goes something like: it's ok to kill other backends which are directly holding locks on our target table, because they really should know better than to try to hold locks on the target table during a repack. But it may be less obvious that other long-running transactions which don't touch the target table at all are holding up the repack, and that they deserve to be killed.

We have already gotten some complaints about pg_repack's default behavior of killing conflicting backends (see e.g. https://github.com/reorg/pg_repack/issues/76 and https://github.com/reorg/pg_repack/issues/90). So I'm pretty hesitant to introduce any more backend killing by default. We might think about adding a non-default option to enable killing of old transactions, though I'm also leery of over-burdening the client with dozens of options.

za-arthur commented 6 months ago

This might be related to this issue https://github.com/reorg/pg_repack/issues/86.

Cdkhanna commented 6 months ago

@za-arthur I am sorry I couldn't find any solution in the issue #86

My pg_repack runs keep waiting like this. I did use -T 30 so that these sessions can be terminated for pg_repack. What else I can do in pg_repack to make sure repacking is not waiting for these locks?

NOTICE: Waiting for 5 transactions to finish. First PID: 9677 NOTICE: Waiting for 5 transactions to finish. First PID: 9677 NOTICE: Waiting for 5 transactions to finish. First PID: 9677 NOTICE: Waiting for 5 transactions to finish. First PID: 9677 NOTICE: Waiting for 5 transactions to finish. First PID: 9677 NOTICE: Waiting for 5 transactions to finish. First PID: 9677 NOTICE: Waiting for 5 transactions to finish. First PID: 9677 NOTICE: Waiting for 5 transactions to finish. First PID: 9677 NOTICE: Waiting for 5 transactions to finish. First PID: 9677 NOTICE: Waiting for 5 transactions to finish. First PID: 9677 NOTICE: Waiting for 5 transactions to finish. First PID: 9677 NOTICE: Waiting for 5 transactions to finish. First PID: 9677 NOTICE: Waiting for 4 transactions to finish. First PID: 9677 NOTICE: Waiting for 4 transactions to finish. First PID: 9677 NOTICE: Waiting for 4 transactions to finish. First PID: 9677 NOTICE: Waiting for 4 transactions to finish. First PID: 9677 NOTICE: Waiting for 4 transactions to finish. First PID: 9677 NOTICE: Waiting for 4 transactions to finish. First PID: 9677 NOTICE: Waiting for 4 transactions to finish. First PID: 9677 NOTICE: Waiting for 4 transactions to finish. First PID: 9677 NOTICE: Waiting for 4 transactions to finish. First PID: 9677 NOTICE: Waiting for 4 transactions to finish. First PID: 9677 NOTICE: Waiting for 4 transactions to finish. First PID: 9677 NOTICE: Waiting for 4 transactions to finish. First PID: 9677 NOTICE: Waiting for 4 transactions to finish. First PID: 9677 NOTICE: Waiting for 4 transactions to finish. First PID: 9677 NOTICE: Waiting for 4 transactions to finish. First PID: 9677

MaximBoguk commented 6 months ago

Not much you can do. You do not have (and never will) a control over amazon RDS or Aurora instances on superuser level required to cancel transactions of other users (including amazon internal stuff). This request should be addressed to amazon support actually. This issue not with pg_repack, but with limited conrol and permissions you have over DB as a service instances in general. I am actually very surprised that the pg_repack work on Aurora at all, given how seriously Aurora internals differ from vanilla PostgreSQL.

haggrip commented 6 months ago

Can you create a case where you can reproduce this on vanilla PostgreSQL?

This can be reproduced with bitnami's Postgresql 15 with pg_repack:

  1. Create a new database
  2. Create a table
  3. Execute any moderately long transaction, for example, executing pg_sleep prevents pg_repack from running with the same message @Cdkhanna mentioned.
  4. Attempt to repack any table. Pg_repack won't execute until transaction is done

Maybe we are missing something from the docs, but our database always has running transactions. Is pg_repack impossible to execute under these conditions? bitnami-pg-repack.Dockerfile.txt

MaximBoguk commented 6 months ago

It's expected behavior. Just let it wait until old transactions finished. PS: long running TS very very detrimental to PostgreSQL performance in general.

haggrip commented 6 months ago

It's expected behavior. Just let it wait until old transactions finished. PS: long running TS very very detrimental to PostgreSQL performance in general.

Thank you @MaximBoguk. We may need to address the long running transactions. I still don't quite get this behavior. Could anybody give more details on this? Are any previously running transactions considered potential interference for pg_repack?

andreasscherbaum commented 6 months ago

In general, PostgreSQL can't let go of old data which may or may not be accessed by older transactions. It's possible that a long-running transaction will also access this table, even though that did not happen, or not yet happen. The database can't know that.