While running an upgrade against an EKS cluster, Boatswain crashed with a panic due to a 500 error returned by the EKS control plane when trying to check a new node's readiness status. The panic occured in K8sClient.GetNodes.
Wait for new node ip-10-200-9-221.us-west-2.compute.internal ready
panic: an error on the server ("{\"Code\":{\"Code\":\"\",\"Status\":500},\"Message\":\"etcdserver: leader changed\",\"Cause\":null,\"FieldName\":\"\"}") has prevented the request from succeeding (get nodes) [recovered]
panic: an error on the server ("{\"Code\":{\"Code\":\"\",\"Status\":500},\"Message\":\"etcdserver: leader changed\",\"Cause\":null,\"FieldName\":\"\"}") has prevented the request from succeeding (get nodes)
[ ... snipped panic stack trace ... ]
Steps to Reproduce the Problem
Not sure how to reproduce this yet
Actual Behavior
Boatswain crashed, leaving the upgrade in a state where manual cleanup (cf. #39 ) had to be performed.
Expected Behavior
Boatswain retries calls when getting 500s from the EKS control plane.
While running an upgrade against an EKS cluster, Boatswain crashed with a panic due to a 500 error returned by the EKS control plane when trying to check a new node's readiness status. The panic occured in
K8sClient.GetNodes
.Steps to Reproduce the Problem
Not sure how to reproduce this yet
Actual Behavior
Boatswain crashed, leaving the upgrade in a state where manual cleanup (cf. #39 ) had to be performed.
Expected Behavior
Boatswain retries calls when getting 500s from the EKS control plane.