kubernetes / minikube

Run Kubernetes locally
https://minikube.sigs.k8s.io/
Apache License 2.0
29.24k stars 4.87k forks source link

Does minikube support AMD GPUs #19463

Open yx-lamini opened 1 month ago

yx-lamini commented 1 month ago

What Happened?

minikube start --driver docker --container-runtime docker --gpus all

Does not seem work with AMD GPUs. Complains docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]]

Attach the log file

N/A

Operating System

Ubuntu

Driver

Docker

medyagh commented 1 month ago

@yx-lamini we dont have support tfor amd GPUs but I would be happy to accept a contribution to add it

yx-lamini commented 1 month ago

Could you briefly explain or point to docs/code-files for Minikube's high-level logical architecture for supporting NVIDIA GPUs?

Or provide suggestions on how we could support AMD GPUs in Minikube?

I'd love to contribute. We actually are actively assess the technical investment for making minikube support AMD GPUs. Now I can already attach AMD GPUs to docker with:

docker run -it --device=/dev/kfd --device=/dev/dri --group-add video rocm/rocm-terminal

I think that's through the container device interface (CDI).

Assuming we build on top of docker's CDI to support AMD GPUs in Minikube, what's the suggested approach we should be taking with Minikube?

Better yet, if minikube's Nvidia support --gpus all is also built on top of CDI, explaining its overall logical architecture, could be helpful to mimic it for AMD GPUs.

medyagh commented 1 month ago

minikube uses the docker's --gpus all to attach the gpu to the container, and we also install the nvidia-smi in the base image is the required for it...so I am wondering if we need to install same driver for amd ?

do you have an example of running gpu workload in a nested container ? (inside the docker container)

that would be cool if we can have support for amd as well. and I am assuming you are talking about dedicated AMD gpus, right?

medyagh commented 1 month ago

btw here is an example of nvidia workload https://github.com/kubernetes/minikube/issues/19486

yx-lamini commented 1 month ago

btw here is an example of nvidia workload https://github.com/kubernetes/minikube/issues/19486

Great, I'll take a look https://github.com/kubernetes/minikube/pull/19345#issuecomment-2257323608

do you have an example of running gpu workload in a nested container ? (inside the docker container)

rocm/pytorch is the one we use. I haven't tested nested container, will get back to you next week.

dedicated AMD gpus, right?

We use AMD GPUs in a data center cluster setting. GPUs are shared among kubernetes pods There is no MIG for AMD GPUs.

Does this align with what you mentioned as "dedicated"?