Open yx-lamini opened 1 month ago
@yx-lamini we dont have support tfor amd GPUs but I would be happy to accept a contribution to add it
Could you briefly explain or point to docs/code-files for Minikube's high-level logical architecture for supporting NVIDIA GPUs?
Or provide suggestions on how we could support AMD GPUs in Minikube?
I'd love to contribute. We actually are actively assess the technical investment for making minikube support AMD GPUs. Now I can already attach AMD GPUs to docker with:
docker run -it --device=/dev/kfd --device=/dev/dri --group-add video rocm/rocm-terminal
I think that's through the container device interface (CDI).
Assuming we build on top of docker's CDI to support AMD GPUs in Minikube, what's the suggested approach we should be taking with Minikube?
Better yet, if minikube's Nvidia support --gpus all
is also built on top of CDI
, explaining its overall logical architecture, could be helpful to mimic it for AMD GPUs.
minikube uses the docker's --gpus all to attach the gpu to the container, and we also install the nvidia-smi in the base image is the required for it...so I am wondering if we need to install same driver for amd ?
do you have an example of running gpu workload in a nested container ? (inside the docker container)
that would be cool if we can have support for amd as well. and I am assuming you are talking about dedicated AMD gpus, right?
btw here is an example of nvidia workload https://github.com/kubernetes/minikube/issues/19486
btw here is an example of nvidia workload https://github.com/kubernetes/minikube/issues/19486
Great, I'll take a look https://github.com/kubernetes/minikube/pull/19345#issuecomment-2257323608
do you have an example of running gpu workload in a nested container ? (inside the docker container)
rocm/pytorch is the one we use. I haven't tested nested container, will get back to you next week.
dedicated AMD gpus, right?
We use AMD GPUs in a data center cluster setting. GPUs are shared among kubernetes pods There is no MIG for AMD GPUs.
Does this align with what you mentioned as "dedicated"?
What Happened?
Does not seem work with AMD GPUs. Complains docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]]
Attach the log file
N/A
Operating System
Ubuntu
Driver
Docker