Closed mariuskimmina closed 2 weeks ago
The committers listed above are authorized under a signed CLA.
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: mariuskimmina Once this PR has been reviewed and has the lgtm label, please assign ellistarn for approval. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Welcome @mariuskimmina!
It looks like this is your first PR to kubernetes-sigs/karpenter 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.
You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.
You can also check if kubernetes-sigs/karpenter has its own contribution guidelines.
You may want to refer to our testing guide if you run into trouble with your tests not passing.
If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!
Thank you, and welcome to Kubernetes. :smiley:
Hi @mariuskimmina. Thanks for your PR.
I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test
on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test
label.
I understand the commands that are listed here.
I think this does count as corporate contribution, it's the first time our company does it tho, so bare with me while I am trying to figure the CLA stuff out.
@mariuskimmina fyi if you haven't seen or were aware of, the @engedaam opened up an RFC that seems to tackle the same set of issues :) https://github.com/kubernetes-sigs/karpenter/pull/1768
@mariuskimmina fyi if you haven't seen or were aware of, the @engedaam opened up an RFC that seems to tackle the same set of issues :) #1768
@njtran thanks for the heads up, his approach does seem more well thought out - I am not sure how I should proceed from here
Hey @mariuskimmina, I'm currently planning on handling the implementation. This is a problem space we are trying to move quickly on to help solve for users. We can close this PR out. If you have the time I would appropriate any and all feedback you can provide on both the RFC and implantation
PR needs rebase.
Closing in favor of https://github.com/kubernetes-sigs/karpenter/pull/1793
Fixes #1659
Description We would like karpenter to be able to terminate nodes if they have been in an unreachable state for too long. This has happened to us in the past and as far as I can tell spotio for example already handles this case. We experienced such a case of the node becoming unreachable when the kubelet on the node died.
This pr introduces a new field to the nodepool
unreachableTimeout
which can be set to e.g. 10 minutes so that Karpenter would actively terminate a node when it has been unreachable for more than 10 minutes.We called it notready controller as that's the state the nodes are in when they become unreachable but there might be a better alternative.
How was this change tested?
We added a test suite for this case and we also tested it on one of our EKS test clusters where we simulated a node becoming unreachable and had Karpenter mark the nodeclaim for deletion.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.