projectsyn / boatswain

Boatswain is a tool for doing EKS node maintenance/upgrades by replacing nodes which were created from outdated launch templates.
BSD 3-Clause "New" or "Revised" License
1 stars 1 forks source link

Setting node-role.kubernetes.io label can fail #18

Closed simu closed 2 years ago

simu commented 4 years ago

When running boatswain on a cluster, I've found that SetNodeRoles can fail with

[ ... snip regular operation ...]
Set node-role.kubernetes.io labels on new node
Identified role worker for node ip-10-200-7-89.us-west-2.compute.internal
node-role.kubernetes.io/worker
panic: Operation cannot be fulfilled on nodes "ip-10-200-7-89.us-west-2.compute.internal": the object has been modified; please apply your changes to the latest version and try again [recovered]
    panic: Operation cannot be fulfilled on nodes "ip-10-200-7-89.us-west-2.compute.internal": the object has been modified; please apply your changes to the latest version and try again

[... snip panic ...]

Steps to Reproduce the Problem

I'm not sure how to reproduce the problem yet

Actual Behavior

Boatswain fully exited leaving the cluster in a half-maintained state. I had to drain and terminate the instance that was being replaced at the time of the failure manually.

Expected Behavior

Boatswain should retry or ignore this non-critical failure.

simu commented 4 years ago

PR to ignore the failure: #17