Question on master - slave in DR scenario

Hello,

Good point, this needs to be documented. I'll summarize the tl;dr: here and use it as basis to write the documentation.

Imagine the A bastion instance is a master, and the B and C instances are configured as slaves, synchronized to the A instance.

If B or C goes down for a short period of time (hours, days), there is nothing to do. If they go down and you don't plan to put them back up for any reason, you'll just have to remove their declaration in the master's configuration, so that the A instance stops trying to sync to the missing slaves (remotehostlist in /etc/bastion/osh-sync-watcher.sh on the A instance).
If the A instance goes down for a short period of time, and you can accept to have your cluster deny all modifications (accounts creation/deletion, groups membership change, etc) during this amount of time, there is nothing to do, the B & C instances will work (accept connections to remote machines, etc.)
If the A instance goes down for a longer period and you want to promote B or C, here's what you need to do:
- First, ensure that if/when A goes back up, it doesn't try to sync its now-out-of-date data to the other instances. In other words, you have to do a STONITH, by commenting its public key in the authorized_keys of the bastionsync user on the B and C instances. This way, the A instance won't be able to push data even if it wakes up from the dead and tries to. You can do this other ways, such as removing its IPs from the B and C instances firewall, for example. What's important is that in the end, A can no longer connect to B or C.
- Then, say you want B to become the master, you can promote it by setting the readOnlySlaveMode option in /etc/bastion/bastion.conf to 0 instead of 1. When this is done (you don't need to restart anything), this instance will start to accept modifying commands (account creation/deletion, etc.).
- What's left if now putting up the synchronization daemon on B, so that it can sync its data to C. This is a subset of what needs to be done when setting up the cluster configuration in the first place (https://ovh.github.io/the-bastion/installation/advanced.html#on-the-master), namely:
- pushing the /root/.ssh/id_master2slave.pub key of B to the authorized_keys of the bastionsync user on C
- enabling the osh-sync-watcher daemon on B (using systemd or sysVinit), and setting enabled=1 in the /etc/bastion/osh-sync-watcher.sh of C
- ensuring (firewall-wise) that B can connect to C
You should then observe in the /var/log/bastion/bastion-scripts.log file of B (if you're using our provided syslog template) that osh-sync-watcher is now syncing to C successfully. You now have B as a master, and C as a slave.
Your infra is stable again and service is fully up
If/when A goes up again
- Either you want to turn it back to the master (because for some reason it's better if A is the master), then you can re-run these steps again exchanging A<->B
- You want to convert it to a slave instance, in which case you set its readOnlySlaveModde to 1, disable the sync daemon on it, then push B's key to A's bastionsync user, add A's IP to B's remotehostlist, reload the sync daemon on B, and B will then sync its data to A.

There are a few configuration choices you can make to make these steps even shorter, such as ensuring you have the sync configuration properly set on all nodes, but the daemon enabled only on one, and the bastionsync keys shared between the nodes, with just a from="IP.OF.INSTANCE.A in from of the declared key everywhere, so that it's the only thing to change in case of promotion of another node. Or you can trade a bit of security to remove yet more steps: allowing any node to connect to any other node from the beggining, so that you mainly have to enable the sync daemon on the new master (and STONITH the other one). It's a tradeoff depending on what you can accept in your environment. I'll document that too.

ovh / the-bastion

Question on master - slave in DR scenario #194