Open p53 opened 1 month ago
operator runs feature discovery and applies appropriate nvidia labels
What kind of feature discovery are you talking about here? Is it stuff related to the properties of the instance type that we are launching?
gfd adds labels after the nodes have already been created.
$ kubectl get nodes -o yaml
apiVersion: v1
items:
- apiVersion: v1
kind: Node
metadata:
...
labels:
nvidia.com/cuda.driver.major: "455"
nvidia.com/cuda.driver.minor: "06"
nvidia.com/cuda.driver.rev: ""
nvidia.com/cuda.runtime.major: "11"
nvidia.com/cuda.runtime.minor: "1"
nvidia.com/gpu.compute.major: "8"
nvidia.com/gpu.compute.minor: "0"
nvidia.com/gfd.timestamp: "1594644571"
nvidia.com/gpu.count: "1"
nvidia.com/gpu.family: ampere
nvidia.com/gpu.machine: NVIDIA DGX-2H
nvidia.com/gpu.memory: "39538"
nvidia.com/gpu.product: A100-SXM4-40GB
...
...
basically you are requesting a workload requiring a node with those labels that we create a node with those labels, but the nodepool is not aware of these labels and we wont be aware of them. They aren't added until gfd goes and adds them. They are added after gpu nodes are provisioned?
How can karpenter know these traits? Seems relevant to per instance type overrides. If you know particular instance types will have particular traits then we can override a configmap to say these instance types have these values for the overrides.
Do these values differ from node to node? Seems cuda runtime is dependent on the gpu drivers installed on the node? We can't just cache them directly.
basically you are requesting a workload requiring a node with those labels that we create a node with those labels, but the nodepool is not aware of these labels and we wont be aware of them. They aren't added until gfd goes and adds them. They are added after gpu nodes are provisioned?
= yup that's right
How can karpenter know these traits? Seems relevant to per instance type overrides. If you know particular instance types will have particular traits then we can override a configmap to say these instance types have these values for the overrides.
- i don't know how karpenter precisely works internally, it is probably possible to know these labels, at least part of them ahead of time and configure them statically, best would be if we would not need to define them in config statically
Do these values differ from node to node? Seems cuda runtime is dependent on the gpu drivers installed on the node? We can't just cache them directly.
- we have e.g. all AWS g5 intances in one nodepool so for sure they will be different for each instance type, depending on gpu type of instance type, having each instance type in separate nodepool would be quite impractical
DRA -> https://github.com/kubernetes-sigs/karpenter/issues/1231 probably solve thing = "knowing before" as third-party drivers would present noderesourceslices when running on cluster altough not sure about its flexibility in terms we are still assuming that something is there before and it is constrained only on resources
Also e.g. node feature discovery adds labels to nodes e.g CPU capabilities
best would be if we would not need to define them in config statically
I think the ideal state here is defining what the different configurations can be for the GPU feature discovery operator and then see if we can surface first-class support for these in Karpenter directly.
Like you mentioned, having to statically configure all of these values is going to be a huge pain, ideally Karpenter can auto-discover them by matching its logic up with what Nvidia tells us should be on these instance types.
I'm wondering if it makes sense to retitle this issue to be more specific to the use-case. Something like: "Support Nvidia GPU Feature Discovery". @p53 What do you think?
/triage accepted
@jonathan-innis renamed
Description
Original Title: Ignore node selector labels for provisioning
What problem are you trying to solve?
We have nvidia operator which installs nvidia runtime etc.. on karpenter nodes after they are provisioned, operator runs feature discovery and applies appropriate nvidia labels, we need to place pods on these karpenter nodes depending on these nvidia labels. Problem is that when i place nvidia labels in nodeSelector on pod, which are not in NodePool, because they are placed on nodes during node runtime by nvidia operator, karpenter will fail to provision nodes. Solution might be e.g. placing some annotations on pod e.g.
karpenter.sh/ignore-label=somelabel
so that karpenter ignores this label during provisioningHow important is this feature to you?