Open jonathan-innis opened 8 months ago
We should also check to make sure instance type labels work here as well, like if someone does an affinity for their DaemonSet nodes like
matchExpressions:
- key: node.kubernetes.io/instance-type
operator: Exists
Do we consider this correctly or do we not know this label exists since we are just using the NodePool representation to track DaemonSet overhead?
/assign
@jonathan-innis . Should we really handle it . since those labels are meant for internal dev purpose and daemonsets are to be scheduled on every node . so there might not be a case where daemonsets should be scheduled on karpenter managed nodes and not on other nodes right ?
since those labels are meant for internal dev purpose and daemonsets are to be scheduled on every node
You can create node selector requirements or node affinities to have DaemonSets only on certain nodes. The use-case that we saw was someone that was running an EKS cluster that had mostly Karpenter-managed capacity but was running a small amount of capacity (including Karpenter itself) on fargate nodes. They didn't want to run the daemonset pods on Fargate nodes, so they create selection logic to only schedule the DaemonSet pods against the Karpenter-provisioned pods.
I think we should try to be correct if we can here to the best of our ability. Since we know that these labels are going to eventually appear on the node, we should schedule with DaemonSet pods assuming that that is going to be the case.
Then in that case we use other label on daemonsets or any other pods to be scheduled on karpenter managed nodes instead of conditioning the labels that was for the dev purpose ? Since like this any user can use any label and we might need to handle all case . instead we could take a preventive measure to mention how to achieve the goal that user was intending exactly ? Correct me if I'm wrong thanks 😊
instead we could take a preventive measure to mention how to achieve the goal that user was intending exactly
Agree, I think it makes sense not to recommend users to go this route to select against Karpenter-managed nodes.
conditioning the labels that was for the dev purpose
I'd argue that they aren't for a dev purpose and that we should be correct if there isn't a large technical burden for us to be correct here. In this case, we just need to expect that these labels will be added to any nodes launched by NodePools when we are calculating the DaemonSet overhead.
/unassign @sadath-12
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
Description
Observed Behavior:
This issue was first reported in https://github.com/aws/karpenter-provider-aws/issues/5406 (see that issue for more context). Karpenter was not calculating DaemonSet overhead correctly when a user was specifying node selection/node affinity logic for DaemonSets that referenced
karpenter.sh/initialized
andkarpenter.sh/registered
. Thekarpenter.sh/registered
label value is added to the node once the Node has properly joined the cluster and thekarpenter.sh/initialized
label value is added to the Node once Karpenter deems the node to have fully initialized.Currently, there is no logic in daemonset calculations that recognizes that that these labels will eventually be added to the node and should therefore be considered if there is node selection logic against them. The lack of this logic can lead to us undershooting the resources that DaemonSets and applications need to schedule.
Expected Behavior:
Karpenter should assume that DaemonSets that select on
karpenter.sh/registered
andkarpenter.sh/initialized
will schedule against Karpenter nodes.