Closed DizzieNight closed 1 week ago
Hi @DizzieNight, did you also deploy the Intel GPU device plugin to the cluster? NFD only doesn't yet suffice.
I just checked and it doesn't seem so actually. I get this error when trying to install using the helm chart
Helm install failed for release node-feature-discovery/intel-gpu-plugin with chart intel-device-plugins-gpu@0.31.1: unable to build kubernetes objects from release manifest: resource mapping not found for name: "gpudeviceplugin" namespace: "" from "": no matches for kind "GpuDevicePlugin" in version "deviceplugin.intel.com/v1" ensure CRDs are installed first
I couldn't find where to install the CRDs though. Any thoughts?
Helm install builds on the operator. Please see the steps here: https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/INSTALL.md#install-with-helm-charts
You can also install gpu-plugin via kubectl: https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/cmd/gpu_plugin/README.md#install-with-nfd
I have installed everything and my node is getting the labels but nfd is coming up with a warning when installing:
W1115 12:36:57.853014 1148310 warnings.go:70] would violate PodSecurity "restricted:latest": restricted volume types (volumes "host-boot", "host-os-release", "host-sys", "host-usr-lib", "host-lib", "host-proc-swaps", "source-d", "features-d" use restricted volume type "hostPath")
And I am not sure how to fix it
Nevermind, set the namespace to privileged using the following commands:
kubectl label namespace node-feature-discovery pod-security.kubernetes.io/enforce=privileged
kubectl label namespace node-feature-discovery pod-security.kubernetes.io/audit=privileged
kubectl label namespace node-feature-discovery pod-security.kubernetes.io/warn=privileged
.
Although jellyfin still won't pick a node. The node I want jellyfin to install to has these labels, I don't see i915 here anywhere though. Does Arc use i915 as well or do I have to set it to something else?
NFD seems to be working fine.
Can you check a few things: 1) Is the GPU device plugin running on the node? Check pods for that specific node. 1) Describe the target node and see if it has "gpu.intel.com/i915" resource?
I did just notice the plugin isn't being created and this is what I get:
Not really sure how to fix it though
Not really sure how to fix it though
this looks to be the same pod security admission error you saw with NFD. The fix should be to label the operator/plugin namespace with the privileged PSA settings.
Yep that worked, it's attaching now. Thank you for your help
Yep that worked, it's attaching now. Thank you for your help
I created #1909 to make the experience a bit smoother.
Describe the bug I am trying to add my arc gpu to my jellyfin pod. I have the NFD installed and it correctly labelling my node with the Intel Arc A310 with the following labels:
nfd.node.kubernetes.io/feature-labels=gpu.intel.com/device-id.0300-56a6.count gpu.intel.com/device-id.0300-56a6.present gpu.intel.com/device-id.0380-1912.count gpu.intel.com/device-id.0380-1912.present gpu.intel.com/family,intel.feature.node.kubernetes.io/gpu
But I put the following into my jellyfin deployment:
resources: requests: gpu.intel.com/i915: "1" limits: gpu.intel.com/i915: "1"
but it still won't find a node with a gpu. It keep coming up with the following error:
0/8 nodes are available: 1 node(s) were unschedulable, 3 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 4 Insufficient gpu.intel.com/i915. preemption: 0/8 nodes are available: 4 No preemption victims found for incoming pod, 4 Preemption is not helpful for scheduling.
To Reproduce Install Arc GPU, install NFD and request i915.
Expected behavior Jellyfin pod should attach to worker 4 which has the Arc A310 GPU
Screenshots If applicable, add screenshots to help explain your problem.
System (please complete the following information):
Additional context Add any other context about the problem here.