hapostgres / pg_auto_failover

Postgres extension and service for automated failover and high-availability
Other
1.12k stars 115 forks source link

Implement support for WAL-G #819

Closed DimCitus closed 2 years ago

DimCitus commented 3 years ago

To implement HA we need automated failover and also Disaster Recovery for the availability of the data. With Postgres that means archiving. Then, archiving intersects with auto-failover in multiple ways, including how to create a standby node from the archives, using restore_command to enhance the reliability of the whole system, allowing standby/secondary nodes to archive WAL files with archive_mode = 'always', and also continuing to maintain the archives during and after a failover.

This PR implements the following 3 commands as a starter-kit for WAL-G support/integration:

  1. pg_autoctl create archiver-policy
  2. pg_autoctl archive wal
  3. pg_autoctl restore wal

More is needed later, in particular:

  1. pg_autoctl archive pgdata
  2. pg_autoctl restore pgdata
  3. automated integration of restoring pgdata from the archives when creating a standby node
  4. integrated scheduler to archive new base backups and purge old ones following the retention policy

Given the size of the current PR, it might be better to focus on this development in several stages. This PR focuses on the WAL archiving, the base backup archiving may be implemented later on-top of it.

Finally, the design has been made in a way that allows support for multiple archive methods, even though at the moment only WAL-G support is implemented. Some wrapper work is required for each new method, but should be pretty easy. The main advantage of maintaining a wrapper is to allow for archive_mode = 'always' thanks to handling WAL file metadata on the monitor. Also, maintaining the configuration of the archiving method on the monitor makes it trivial to share it with all the nodes, even when the configuration needs updating.

Tiago-Anastacio commented 2 years ago

Not read the whole patch but: Caution with archive_mode = 'always' , you may need different backup repositories (one for each PostgreSQL instance) because this bug may not be fixed : wal from standbys contains same logical information from the ones from the primary, but checksums may differ.

see: https://pgbackrest.org/configuration.html#section-backup/option-archive-mode-check

raivil commented 2 years ago

Hey @DimCitus, Is the goal of integrating Wal-g with pg_auto_failover still valid?

I'm considering wal-g for a new Citus 11 cluster where it will also use pg_auto_failover now that it's supporting Citus with 2.0 release.

Thanks for the great work on pg_auto_failover! Best,

DimCitus commented 2 years ago

Hi @raivil ; I still want to add support for archiving in pg_auto_failover yes. I have no idea of when I will be able to get back on this work though, so if you wanted to contribute, please consider it!