Add rolling_deploy_on_docker_failure option

imakewebthings commented 7 years ago

Currently, when a rolling deploy errors during the process of stopping an existing container and spinning up the new one, the deploy stops and exits by raising that error. The longer the list of servers you're deploying to, the more of a pain it is to determine which servers did and did not successfully deploy, and you're left in an inconsistent state.

This PR adds an option, rolling_deploy_on_docker_failure that defaults to :exit, which preserves the existing behavior. When set to :continue, Centurion will try to deploy to every host on its list and keep a running collection of the errors it encounters along the way. When all the servers are done, it will raise a single error with a concatenation of all the error messages it encountered. This should:

Ensure hosts that are healthy at the time of deploy get deployed to, regardless of the health of other hosts in the list.
Make it easier to see at the end of the deploy failure which hosts are unhealthy and still running the old container.

relistan commented 7 years ago

Seems like a good addition to me. Thoughts @intjonathan ?

intjonathan commented 7 years ago

Does the existing deploy:repair task just need documentation and fixing? Seems like that'd be a one-shot cleanup task you could run instead of using the output of this to build a host-specific deploy.

imakewebthings commented 7 years ago

@intjonathan I think repair would have to do a lot more work than check status endpoints to determine the hosts that need the deploy. It would have to check all running containers for version mismatches, and that's assuming the deployer is using distinct versioned image tags and not latest. This could be a neat feature of repair.

I think the main benefit of the option in this PR is to reduce the blast radius in terms of undeployed-to hosts in the event that an error in communicating with a given host occurs in the middle of a deploy. That plus a more robust repair would be a great combo.

CLAassistant commented 4 years ago

All committers have signed the CLA.

newrelic / centurion

Add rolling_deploy_on_docker_failure option #180