nuagenetworks / nuage-metroae

Nuage Networks Metro Automation Engine
http://devops.nuagenetworks.net
Apache License 2.0
44 stars 17 forks source link

Add support to gracefully shutdown the nuage components #1238

Closed puppetninja closed 5 years ago

puppetninja commented 5 years ago

Hi Metro Team,

Can we add support to gracefully shutdown some of the deployed Nuage components ? Especially for components like VSD clusters

Thanks

ghost commented 5 years ago

Hello, @PuppetNinja! Could you please describe your use case? What is the purpose of shutting down the VMs? Would you also need the ability to spin them back up again?

It turns out we may have something like this already for VSD and ES. During the upgrade procedure on KVM and vCenter hypervisors, we gracefully shut down the old VSDs and ES nodes without fully undefining or destroying them. To support this, we have a variable called "preserve_vm". You can get the behavior you desire by setting an extra variable on the command line. For example:

./metroae vsd_destroy -e preserve_vm=true
./metroae vstat_destroy -e preserve_vm=true

We do not have the ability to spin them back up. We do not have the ability to gracefully shut down VSC or NSGv.

puppetninja commented 5 years ago

Hi @bacastelli,

I am working on bringing up the Nuage lab in Westford, and I hit the issue for our lab VSD cluster, there was a planned downtime for power maintenance so we need to shutdown the VSD cluster, I shutdown one of the VSD node without a graceful shutdown first using monit as I was on a terminal connected to a wrong node.

And the VSD cluster failed to be back online again after we powered up the rack.

I am not sure whether this is a VSD product problem or Ops problem. But I think it would be less error prone to make that ops work automated.

I will try the solution you provided. Thanks !

mpiecuch-nuage commented 5 years ago

In our testing, we shutdown all of the services before shutting down the VMs: monit -g vsd-stats stop monit -g vsd-core stop monit -g vsd-common stop

On each of the VSDs.

Then to bring them back up, we start them: monit -g vsd-stats start monit -g vsd-core start monit -g vsd-common start

and then on the primary VSD only, we set the master persona: /opt/vsd/sysmon/bootPercona.py --force

There is a "graceful shutdown" procedure somewhere in the Nuage documentation which explains this.

puppetninja commented 5 years ago

Hi @mpiecuch-nuage yes, I was wondering if the automation of the above followed by VSDs shutdown could be provided by a playbook in metro. In my case I forgot the above steps on one of the VSD node in a cluster and the cluster didn't come up after we power up the hypervisors.

puppetninja commented 5 years ago

And I do see the this playbook https://github.com/nuagenetworks/nuage-metro/blob/master/src/playbooks/with_build/vsd_services_stop.yml

For shutting down the services only, but it is already quite useful, thanks, I can close this one