palantir / bouncer

An application to cycle (bounce) all nodes in a coordinated fashion in an AWS ASG or set of related ASGs
Apache License 2.0
129 stars 21 forks source link

Classification around Canary #51

Open gsdevme opened 5 years ago

gsdevme commented 5 years ago

Hi,

I was just doing some testing with this tool with an ALB and ASG and a Canary deployment. One thing I noticed was bouncer would assert instance health outside of the ASG.

Bouncer seemed to only assert the instance had booted and not that it was fully operational within the ASG at this point it then started to drain connections and deregister the existing instance which caused a blip of downtime before the ASG had asserted the canary instance was healthy.

Is that to be expected that the "Canary" health is not asserted this way?

holtwilkins commented 5 years ago

Hey there, thanks for your interest!

Bouncer assumes that you’re using pending hooks for recording when an instance is healthy and ready to be considered “done”. I wrote up some more details recently in https://github.com/palantir/bouncer/issues/49 as well.

Is this the type of thing you were looking for?

gsdevme commented 5 years ago

Interesting. So I take it rather than using the ALB assert the health against the instance bouncer expects the instance itself to report Healthy/Unhealthy.

Would you take a PR that added the ability to assert health via a NLB/ALB If present?

holtwilkins commented 5 years ago

So, alternatively to using pending hooks, if you choose alb health in the asg config vs ec2 health, it may “just work” (pretty sure it did for elbs, but it’s been awhile). Have you tried this / is this still an option for albs?

If not, where would your pr look for alb health from, as in, it would ignore the InService status in the asg api and instead query the target group or something?

gsdevme commented 4 years ago

Hi sorry, In terms of the ASG Config for health check type I will double this.

AirbornePorcine commented 4 years ago

We were just looking at this - and no, it doesn't just work by default with ALBs unfortunately. We may look at submitting a PR for this as I think it's reasonable to assume that if we've told our ASG's health checks to look at the load balancer, this tool should also look at the load balancer.

holtwilkins commented 4 years ago

That does sound reasonable if you use ALB health checks instead of pending/terminate hooks, that this could look at that too. If you wanted to add an optional flag that would check status of nodes against an ALB and only declare them "done" once healthy in the ALB, that sounds like a reasonable option to add, don't see why we wouldn't take a PR that added that option.