puppetlabs / puppetdb

Centralized Puppet Storage
http://docs.puppetlabs.com/puppetdb
Apache License 2.0
301 stars 226 forks source link

PuppetDB garbage collection may not always remove partitions #4013

Closed austb closed 1 month ago

austb commented 1 month ago

Describe the Bug

Two failure cases exist in the garbage collection logic.

In one case, a partition can get into a state Postgres describes as "pending" detachment. The process of detaching a partition is a two transaction process. When the first transaction succeeds, the partition is now "pending". The second transaction needs a more restrictive lock on the partition and can sometimes fail. Users will see errors with a hint to finalize the process, and no subsequent garbage collections will be able to make progress because Postgres only allows a single partition to be pending at a given time.

ERROR:  partition "reports_20240915z" already pending detach in partitioned table "public.reports_historical"
HINT:  Use ALTER TABLE ... DETACH PARTITION ... FINALIZE to complete the pending detach operation.

In the second case, the partition was fully detached from the parent, but was not successfully dropped from the database. In this case, subsequent garbage collections will continue to work, but PuppetDB will never remove that "stranded" partition.

Steps to Reproduce

I have been unable to reproduce the two bugs using just PuppetDB (due to some extra safeguards there), but manually I can construct operations that reproduce both situations.

The easiest way to reproduce the first case is to create a transaction, issue a select on the partition you will be dropping, but leave the transaction open. Then get another connection, shorten the statement_timeout and issue a detach command. Manually, this looks something like

BEGIN TRANSACTION;
select * from reports_20240914z ;
SET statement_timeout=1000;
ALTER TABLE reports_historical DETACH PARTITION reports_20240914z CONCURRENTLY;

The second case can be reproduced simply by running ALTER TABLE reports_historical DETACH PARTITION reports_20240914z CONCURRENTLY; on a partition manually.

austb commented 1 month ago

Released in 7.20.1 and 8.8.1.