Open ghost opened 3 years ago
Hello,
Good point, this needs to be documented. I'll summarize the tl;dr: here and use it as basis to write the documentation.
Imagine the A bastion instance is a master, and the B and C instances are configured as slaves, synchronized to the A instance.
If B or C goes down for a short period of time (hours, days), there is nothing to do. If they go down and you don't plan to put them back up for any reason, you'll just have to remove their declaration in the master's configuration, so that the A instance stops trying to sync to the missing slaves (remotehostlist
in /etc/bastion/osh-sync-watcher.sh
on the A instance).
If the A instance goes down for a short period of time, and you can accept to have your cluster deny all modifications (accounts creation/deletion, groups membership change, etc) during this amount of time, there is nothing to do, the B & C instances will work (accept connections to remote machines, etc.)
If the A instance goes down for a longer period and you want to promote B or C, here's what you need to do:
authorized_keys
of the bastionsync
user on the B and C instances. This way, the A instance won't be able to push data even if it wakes up from the dead and tries to. You can do this other ways, such as removing its IPs from the B and C instances firewall, for example. What's important is that in the end, A can no longer connect to B or C.readOnlySlaveMode
option in /etc/bastion/bastion.conf
to 0
instead of 1
. When this is done (you don't need to restart anything), this instance will start to accept modifying commands (account creation/deletion, etc.)./root/.ssh/id_master2slave.pub
key of B to the authorized_keys
of the bastionsync
user on Cosh-sync-watcher
daemon on B (using systemd or sysVinit), and setting enabled=1
in the /etc/bastion/osh-sync-watcher.sh
of CYou should then observe in the /var/log/bastion/bastion-scripts.log
file of B (if you're using our provided syslog template) that osh-sync-watcher
is now syncing to C successfully. You now have B as a master, and C as a slave.
Your infra is stable again and service is fully up
If/when A goes up again
readOnlySlaveModde
to 1
, disable the sync daemon on it, then push B's key to A's bastionsync
user, add A's IP to B's remotehostlist
, reload the sync daemon on B, and B will then sync its data to A.There are a few configuration choices you can make to make these steps even shorter, such as ensuring you have the sync configuration properly set on all nodes, but the daemon enabled only on one, and the bastionsync keys shared between the nodes, with just a from="IP.OF.INSTANCE.A
in from of the declared key everywhere, so that it's the only thing to change in case of promotion of another node. Or you can trade a bit of security to remove yet more steps: allowing any node to connect to any other node from the beggining, so that you mainly have to enable the sync daemon on the new master (and STONITH the other one). It's a tradeoff depending on what you can accept in your environment. I'll document that too.
Imagine that you have your master bastion in region1 and your slave bastion in region2. Could I make the slave a master in case region1 goes offline for a longer period of time? Is there a way to rollback in case region1 comes online again?
I would like to avoid hosting a multiple masters as that adds a burden on administration of users and keys.