Closed maxadamo closed 1 year ago
ideally we could create a Bolt task to trigger the execution of the script
@sebastianrakel @attachmentgenie do you have some thoughts here? maybe we sould have a bolt plan for this?
I also feel a bolt task would be more appropriate, in a meltdown situation i dont see anyone changing and pushing hiera changes in an emergency.
@attachmentgenie the idea is to create the file peers.json
, pulling the data from PuppetDB (and fall-back to hiera only if you miss PuppetDB), and not only when you need it. The file will always be there, ready to be used.
It's gonna be the same with Bolt, but if you don't have the puppetDB it's even worse with Bolt, because you'll need to input all the data when you are in a meltdown situation, and it's gonna be easier to create the peers.json
manually.
IMO the Bolt plan is eventually an addition to the puppet manifests. And if you don't have the PuppetDB I would recommend to fill in the data in advance.
@attachmentgenie are you also good with the change, and is it clear how it works?
If all your servers are down, you just to run: /usr/local/bin/nomad-server-outage-recover.sh
on your nomad servers (not the agents). EOS
I can merge it straight away, and I am asking because I already started working on the next PR to fix #84
Affected Puppet, Ruby, OS and module versions/distributions
n/a
How to reproduce
bring down the nomad daemon on all your nomad servers
What are you seeing
you won't be able to restart the daemon
What behaviour did you expect instead
have a procedure, a script, or a Bolt task
Output log
n/a
Proposed solution
Recovering from outage, is a time consuming operation, but it can be partially automated. The manifest below creates a script which can be run from the servers to recovery the cluster. If you have PuppetDB you can use
nomad_server_regex
otherwise you need to pre-fill a hash and usenomad_server_hash
.