Closed vbedida79 closed 1 year ago
What happens if you only run xpumanager container on your physical machine? Just like description here: https://hub.docker.com/r/intel/xpumanager .
Unfortunately, since the setup is on OpenShift with RedHat Core OS node, its restrictive to deploy anything on the host node. So we deploy it via container and schedule it on Flex node with GPU plugin. Also it runs as a privileged container, if that could help. Anything else we could check via container?
Please change SPDLOG_LEVEL to trace and provide more detailed logs. https://github.com/intel/xpumanager/blob/master/deployment/kubernetes/daemonset-intel-xpum.yaml#L31
found the issue, seems the permissions were not given correctly to the pod on openshift. gave it the highest permissions with privileged. works now, thanks!
Hi, for Intel Data Center GPU Flex 140, on OCP- with the Intel device plugins operator GPU plugin, xpumanager daemonset and xpumanager_side car it fails with error below. Used the kustomization yaml with xpumanager master branch, v1.2.18 latest release and v1.2.13 for the docker image
intel/xpumanager:v1.2.13
tag.Is it recommended to build specific release image from scratch to deploy? Or any specific requirements that I missed in the deployment? Thank you!