thoughtbot / flightdeck

Terraform modules for rapidly building production-grade Kubernetes clusters following SRE practices.
MIT License
91 stars 10 forks source link

Automatically terminate EC2 instances when the Kubelet stops reporting #67

Open jferris opened 2 years ago

jferris commented 2 years ago

We have seen several instances where:

We would like to deploy a mechanism to automatically detect and terminate these instances rather than doing it manually.

https://mission-control.thoughtbot.com/branch/main/aws/book/debug/cluster-errors.html#kubelet-stopped-posting-node-status

clarissalimab commented 3 days ago

This has been happening with a particular infra. The nodes are marked with an "Unknown" state and they don't get replaced automatically.