hapostgres / pg_auto_failover

Postgres extension and service for automated failover and high-availability
Other
1.07k stars 113 forks source link

Feature request post failover command #878

Open Tiago-Anastacio opened 2 years ago

Tiago-Anastacio commented 2 years ago

Salut Dimitri,

Aim is to provide a parameter to execute a command (whatever command DBAs set) once a failover successfully happened

Use case is:

Because the only component which knows that a failover completely and succefullly happened is pg_autofailover.

Tiago

DimCitus commented 2 years ago

Hi Tiago!

The idea of executing a command when there is a state change has been discussed in the past already. See https://github.com/citusdata/pg_auto_failover/issues/440 and https://github.com/citusdata/pg_auto_failover/issues/490.

I think we should include a pg_autoctl service that LISTENs to our notification channels and allows users to register commands to execute at state change, with a setup that allows also targeting specific state changes (such as “a new primary node has now been elected“). For robustness reasons, this should not be included in the main loop, and should be implemented as a separate service.

Then it might be nice to offer that as its own pg_autoctl service/node too, so that it could be run on application nodes etc. When implementing that ability, we should also make it possible to run a pg_autoctl setup that does nothing, just to make it easy to run pg_autoctl watch and other client commands from anywhere.

Tiago-Anastacio commented 2 years ago

About: _I think we should include a pgautoctl service that LISTENs ... ... and should be implemented as a separate service.

OK but I would rather use a service that look for these informations inside pg_monitor DB. Because we may not catch that signal (through LISTEN), I may be wrong but this signal is sent only once, right ?

DimCitus commented 2 years ago

The trick with LISTEN/NOTIFY is that we would have to execute the command each time pg_autoctl is started and then each time there is a failover, or say a state change that is registered to trigger the command. So in the case of firing a command for a new primary node, it means each time you start the pg_autoctl service with the command hook installed, then the command is executed with the current primary, even if it didn't change from the previous command execution. The command itself should be able to manage that.

jnehlmeier commented 1 year ago
  • another example is indirection (a.k.a switch-ip) if client does not support native high availability (e.g. they can't set two hosts on connection strings, example ODBC or embedded drivers configuration ), we could use this parameter (for example to reconfigure pgbouncer)

You could use HAProxy (or similar) in front of your pg_auto_failover cluster and use select pg_is_in_recovery() to figure out who is primary and should receive requests.

s4ke commented 1 year ago
  • another example is indirection (a.k.a switch-ip) if client does not support native high availability (e.g. they can't set two hosts on connection strings, example ODBC or embedded drivers configuration ), we could use this parameter (for example to reconfigure pgbouncer)

You could use HAProxy (or similar) in front of your pg_auto_failover cluster and use select pg_is_in_recovery() to figure out who is primary and should receive requests.

+1

For an example HAProxy configuration you can look here:

https://github.com/neuroforgede/pg_auto_failover_ansible/wiki/HAProxy

vadius45 commented 1 year ago

+1 for that feature request : for example, one may want to ensure that cfg files / password files are synchronized on all hosts through an external script that is executed when a standby is promoted.