intel / intel-device-plugins-for-kubernetes

Collection of Intel device plugins for Kubernetes
Apache License 2.0
48 stars 205 forks source link

GPU device plugin deployment issue (non default namespace) #1840

Closed pawel-gacek closed 1 month ago

pawel-gacek commented 2 months ago

Describe the bug GPU device plugin will not work properly once NOT installed in default namespace. For the ClusterRoleBinding resource the ServiceAccount namespace is set to "default" once installed using kustomization tool regardless of namespace configured/used during GPU device plugin deployment: https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/deployments/gpu_plugin/overlays/fractional_resources/gpu-manager-rolebinding.yaml

apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: gpu-manager-rolebinding subjects:

To Reproduce Install GPU device plugin in non default namespace with kustomization.

Expected behavior For ClusterRoleBinding resource (name gpu-manager-rolebinding) the ServiceAccount namespace is set to desired namespace.

System (please complete the following information):

Thank you Pawel

tkatila commented 1 month ago

Hi @pawel-gacek, yep, you are correct. This is a limitation of the deployment. We can't change the namespace name within the yaml file. The namespace is handled properly in our operator based deployment, though.

pawel-gacek commented 1 month ago

hi @tkatila got it thanks. Cause it may cause some issues in plugin operation as deployment itself works fine. Would be good if such limitation can be documented somewhere as I believe there are still kustomization based deployments in use. In our case we simply have not noticed that GPU plugin did not work properly until we have seen the GPU resource allocation failure for one of our workload.

tkatila commented 1 month ago

Sure. I'll add a note about it to the advanced deployments docs.

Off-topic: fractional resources is a sort of niche use case, how are you using it?

pawel-gacek commented 1 month ago

We do use GPU Aware Scheduler extender that requires fractional resources to be enabled with GPU dev plugin.