hapostgres / pg_auto_failover

Postgres extension and service for automated failover and high-availability
Other
1.09k stars 114 forks source link

Consider dropping data directory during recovery via pg_basebackup #857

Closed thanodnl closed 1 year ago

thanodnl commented 2 years ago

When a failover happens, it does not always succeed to pg_rewind the old primary. Instead if has a fallback to recover via pg_basebackup. This is great!

However, once the database becomes bigger in size than 50% of the available diskspace (give or take, inodes could cause other issues) a pg_basebackup might not succeed without operator intervention.

Instead it would be great if pg_auto_failover has an option where, instead of retaining the old database directory, it would delete this directory to ensure enough space is available on the node before initiating a pg_basebackup.

Alternatively we could go as far as designing a tristate for this setting:

DimCitus commented 2 years ago

See also #853 that lead us to using pg_basebackup tar format (maybe even tar.gz) when fetching the data, prior to swapping it in PGDATA. It makes the reasoning about necessary disk space more complex in a way, because now we might still need to have both the “download” area and the “production” area used at the same time for a while.

DimCitus commented 1 year ago

Given the following in our function pg_basebackup https://github.com/citusdata/pg_auto_failover/blob/d7997ffc3f1209483a37fe7e8ed49fe7a000f664/src/bin/pg_autoctl/pgctl.c#L1280 I would say that https://github.com/citusdata/pg_auto_failover/pull/870 indeed fixed this.