kubernetes / minikube

Run Kubernetes locally
https://minikube.sigs.k8s.io/
Apache License 2.0
29.42k stars 4.88k forks source link

Minikube not working on GPU NVIDIA RTX 3090 with --driver=none #16827

Closed joaquinfdez closed 7 months ago

joaquinfdez commented 1 year ago

What Happened?

Description of the problem: I have successfully setup and launched minikube, however I cannot detect the GPU in my setup.

From the official Minikube tutorials page here, it's suggested to use either the KVM2 driver or the 'none' driver. However, after several trials, I've noticed that the KVM2 driver does not seem to support my NVIDIA GPU, specifically the RTX 3090.

To overcome this, I decided to use the 'none' driver for my Minikube setup. The 'none' driver seems to work well with the NVIDIA GPU RTX 3090. I did the process as follows but I couldn't run minikube on GPU.

Steps taken:

  1. Disabled file system protections:
sudo sysctl fs.protected_regular=0
fs.protected_regular = 0
  1. Reloaded the systemd manager configuration:

    sudo systemctl daemon-reload
  2. Enabled and started cri-docker.service and cri-docker.socket:

    sudo systemctl enable cri-docker.service
    sudo systemctl enable --now cri-docker.socket
  3. Started Minikube using the none driver:

    minikube start --driver=none --apiserver-ips 127.0.0.1 --apiserver-name localhost
  4. Checked Minikube status, which was running correctly:

    minikube status
    minikube
    type: Control Plane
    host: Running
    kubelet: Running
    apiserver: Running
    kubeconfig: Configured
  5. Created a daemonset using a NVIDIA k8s device plugin:

kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/master/nvidia-device-plugin.yml
daemonset.apps/nvidia-device-plugin-daemonset created
  1. Checked node status:
    kubectl get nodes -ojson | jq .items[].status.capacity
    {
    "cpu": "64",
    "ephemeral-storage": "1921221768Ki",
    "hugepages-1Gi": "0",
    "hugepages-2Mi": "0",
    "memory": "131704932Ki",
    "pods": "110"
    }

    Despite these steps, I am unable to detect the GPU RTX 3090 with minikube. Can anyone provide some guidance on what could be the issue?

Attach the log file

$ minikube start --driver=none --apiserver-ips 127.0.0.1 --apiserver-name localhost
😄  minikube v1.30.1 en Ubuntu 20.04
✨  Using the none driver based on user configuration
👍  Starting control plane node minikube in cluster minikube
🤹  Running on localhost (CPUs=64, Memory=128618MB, Disk=1876193MB) ...
ℹ️  OS release is Ubuntu 20.04.6 LTS
🐳  Preparando Kubernetes v1.26.3 en Docker 24.0.2...
    ▪ kubelet.resolv-conf=/run/systemd/resolve/resolv.conf
    ▪ Generando certificados y llaves
💢  initialization failed, will try again: wait: /bin/bash -c "sudo env PATH="/var/lib/minikube/binaries/v1.26.3:$PATH" kubeadm init --config /var/tmp/minikube/kubeadm.yaml  --ignore-preflight-errors=DirAvailable--etc-kubernetes-manifests,DirAvailable--var-lib-minikube,DirAvailable--var-lib-minikube-etcd,FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml,FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml,FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml,FileAvailable--etc-kubernetes-manifests-etcd.yaml,Port-10250,Swap,NumCPU,Mem": exit status 1
stdout:
[init] Using Kubernetes version: v1.26.3
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/var/lib/minikube/certs"
[certs] Using existing ca certificate authority
[certs] Using existing apiserver certificate and key on disk
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost PC_RTX3090] and IPs [10.10.68.61 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost PC_RTX3090] and IPs [10.10.68.61 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"

stderr:
W0706 10:23:45.430412   58489 initconfiguration.go:119] Usage of CRI endpoints without URL scheme is deprecated and can cause kubelet errors in the future. Automatically prepending scheme "unix" to the "criSocket" with value "/var/run/cri-dockerd.sock". Please update your configuration!
        [WARNING Swap]: swap is enabled; production deployments should disable swap unless testing the NodeSwap feature gate of the kubelet
error execution phase kubeconfig/admin: a kubeconfig file "/etc/kubernetes/admin.conf" exists already but has got the wrong CA cert
To see the stack trace of this error execute with --v=5 or higher

    ▪ Generando certificados y llaves
    ▪ Iniciando plano de control
    ▪ Configurando reglas RBAC...
🔗  Configurando CNI bridge CNI ...
🤹  Configuranto entorno del host local ...

❗  The 'none' driver is designed for experts who need to integrate with an existing VM
💡  Most users should use the newer 'docker' driver instead, which does not require root!
📘  For more information, see: https://minikube.sigs.k8s.io/docs/reference/drivers/none/

❗  La configuración de kubectl y de minikube se almacenará en /home/PC_RTX3090
❗  Para usar comandos de kubectl o minikube como tu propio usuario, puede que debas reubicarlos. Por ejemplo, para sobrescribir tu configuración, ejecuta:

    ▪ sudo mv /home/PC_RTX3090/.kube /home/PC_RTX3090/.minikube $HOME
    ▪ sudo chown -R $USER $HOME/.kube $HOME/.minikube

💡  El proceso se puede automatizar si se define la variable de entorno CHANGE_MINIKUBE_NONE_USER=true
    ▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🔎  Verifying Kubernetes components...
🌟  Complementos habilitados: default-storageclass, storage-provisioner
🏄  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default
$ minikube status
minikube
type: Control Plane
host: Running
kubelet: Running
apiserver: Running
kubeconfig: Configured
$ kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/master/nvidia-device-plugin.yml
daemonset.apps/nvidia-device-plugin-daemonset created
$ kubectl get nodes -ojson | jq .items[].status.capacity
{
  "cpu": "64",
  "ephemeral-storage": "1921221768Ki",
  "hugepages-1Gi": "0",
  "hugepages-2Mi": "0",
  "memory": "131704932Ki",
  "pods": "110"
}

Operating System

Ubuntu

Driver

None

malfonsoarquimea commented 1 year ago

I am having the same problem with the 3090s, so would be great if someone could help here :S

k8s-triage-robot commented 9 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 8 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 7 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 7 months ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes/minikube/issues/16827#issuecomment-2016646872): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.