projectsyn / boatswain

Boatswain is a tool for doing EKS node maintenance/upgrades by replacing nodes which were created from outdated launch templates.
BSD 3-Clause "New" or "Revised" License
1 stars 1 forks source link

Boatswain panics due to 500 error from EKS control plane #40

Closed simu closed 2 years ago

simu commented 4 years ago

While running an upgrade against an EKS cluster, Boatswain crashed with a panic due to a 500 error returned by the EKS control plane when trying to check a new node's readiness status. The panic occured in K8sClient.GetNodes.

Wait for new node ip-10-200-9-221.us-west-2.compute.internal ready
panic: an error on the server ("{\"Code\":{\"Code\":\"\",\"Status\":500},\"Message\":\"etcdserver: leader changed\",\"Cause\":null,\"FieldName\":\"\"}") has prevented the request from succeeding (get nodes) [recovered]
    panic: an error on the server ("{\"Code\":{\"Code\":\"\",\"Status\":500},\"Message\":\"etcdserver: leader changed\",\"Cause\":null,\"FieldName\":\"\"}") has prevented the request from succeeding (get nodes)

[ ... snipped panic stack trace ... ]

Steps to Reproduce the Problem

Not sure how to reproduce this yet

Actual Behavior

Boatswain crashed, leaving the upgrade in a state where manual cleanup (cf. #39 ) had to be performed.

Expected Behavior

Boatswain retries calls when getting 500s from the EKS control plane.