keikoproj / lifecycle-manager

Graceful AWS scaling event on Kubernetes using lifecycle hooks
Apache License 2.0
93 stars 28 forks source link

node deletion #127

Closed shreyas-badiger closed 1 year ago

shreyas-badiger commented 1 year ago

node IPs are reused by the cloud providers. When a new node joins the cluster with a previously allocated IP, it will fetch the same object which has a previous role and labels. Because of this the node is never able to join the cluster and will be stuck in NotReady state. Therefore, once the node is drained, we can safely delete the node.

codecov[bot] commented 1 year ago

Codecov Report

Merging #127 (ad406fe) into master (279252c) will decrease coverage by 0.34%. The diff coverage is 61.53%.

:exclamation: Current head ad406fe differs from pull request most recent head 41b8e16. Consider uploading reports for the commit 41b8e16 to get more accurate results

@@            Coverage Diff             @@
##           master     #127      +/-   ##
==========================================
- Coverage   70.05%   69.71%   -0.34%     
==========================================
  Files          12       12              
  Lines        1259     1311      +52     
==========================================
+ Hits          882      914      +32     
- Misses        310      325      +15     
- Partials       67       72       +5     
Impacted Files Coverage Δ
pkg/service/events.go 81.25% <ø> (ø)
pkg/service/nodes.go 70.70% <50.00%> (-3.03%) :arrow_down:
pkg/service/server.go 60.55% <65.51%> (+0.30%) :arrow_up:
pkg/service/lifecycle.go 100.00% <100.00%> (ø)
pkg/service/metrics.go 82.35% <100.00%> (+0.53%) :arrow_up:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

ZihanJiang96 commented 1 year ago

why we need to delete the node in lifecyle manager explicitely?

shreyas-badiger commented 1 year ago

why we need to delete the node in lifecyle manager explicitely?

Because the node IPs are reused by the cloud providers. When a new node joins the cluster with a previously allocated IP, it will fetch the same object which has previous role and labels. Because of this the node is never able to join the cluster and will be stuck in NotReady state.

shreyas-badiger commented 1 year ago
time="2023-07-18T17:52:58Z" level=info msg="i-09112ac4d1548f190> deleting node/ip-10-197-112-216.us-west-2.compute.internal"
time="2023-07-18T17:52:58Z" level=info msg="node successfully deleted"
time="2023-07-18T17:52:58Z" level=info msg="i-09112ac4d1548f190> completed node deletion/ip-10-197-112-216.us-west-2.compute.internal"