if EC2 failure occurs, and then the node is terminated by ASG or a person, the hook is received by lifecycle-manager and the drain/deregister flow will start.
In this case we will fail to drain for as long as --drain-timeout, this keeps the instance alive in the meanwhile and applications can see errors due to instance still being in target-groups.
We should evaluate whether we should try to deregister-only or skip alltogether when the node state is unknown
if EC2 failure occurs, and then the node is terminated by ASG or a person, the hook is received by lifecycle-manager and the drain/deregister flow will start. In this case we will fail to drain for as long as
--drain-timeout
, this keeps the instance alive in the meanwhile and applications can see errors due to instance still being in target-groups.We should evaluate whether we should try to deregister-only or skip alltogether when the node state is
unknown