Open shandr opened 3 weeks ago
I wish this was a solution provided in Kubernetes. There have been a couple of solutions proposed and none of them have been accepted. I wish that every project did not need to implement their own solution to fix this challenge.
I am for creating a solution that would work for both Karpenter users and Cluster Autoscaler users short term. Long term we should consider a KEP to solve this problem in Kubernetes.
With Cluster Autoscaler it's also possible to add some custom taint to Managed Nodegroups that spegel can remove after initialization. So it should work for both.
Describe the problem to be solved
On EKS, with Karpenter for nodes provisioning, in 99%, the workload pod starts before the spegel daemonset configures the node. This makes spegel useless in dynamic EKS + Karpenter environments. The suggested in the FAQ solution nidhogg is very unstable and not working as expected or not working at all. Additionally, with Karpenter, adding a taint that Karpenter is unaware of on the node can lead to a situation when Karpenter will create new nodes, because the pod doesn't have the required toleration.
Proposed solution to the problem
Use the "cilium" approach. With Karpenter (or some other methods) add startupTaints with some predefined name. When spegel initialized, it should remove the taint from the node.