kubernetes / minikube

Run Kubernetes locally
https://minikube.sigs.k8s.io/
Apache License 2.0
29.05k stars 4.86k forks source link

Minikube start brings coredns pod online before CNI initializes, breaking DNS for 2+ nodes #11608

Open cwilkers opened 3 years ago

cwilkers commented 3 years ago

Steps to reproduce the issue:

  1. minikube start --cni=flannel --nodes=2
  2. kubectl get po -A -o wide
  3. Observe IP of coredns-x-y pod
  4. Bring up pod on node minikube-m02 and attempt to query any DNS address.

In my case, I am trying to install the kubevirt addon, which creates a pod kubevirt-install-manager in the kube-system namespace, which usually schedules on the second node. This pod attempts to download deployment YAML from the kubevirt project, which fails when DNS is unreachable.

Full output of minikube logs command: logs.txt

Full output of failed command:

kubectl get po -A -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-system coredns-74ff55c5b-gct25 1/1 Running 0 3m19s 10.88.0.2 minikube kube-system etcd-minikube 1/1 Running 0 3m27s 192.168.39.118 minikube kube-system kube-apiserver-minikube 1/1 Running 0 3m27s 192.168.39.118 minikube kube-system kube-controller-manager-minikube 1/1 Running 0 3m27s 192.168.39.118 minikube kube-system kube-flannel-ds-amd64-96tgf 1/1 Running 0 2m50s 192.168.39.182 minikube-m02 kube-system kube-flannel-ds-amd64-sqtbg 1/1 Running 0 3m18s 192.168.39.118 minikube kube-system kube-proxy-fcfnx 1/1 Running 0 3m19s 192.168.39.118 minikube kube-system kube-proxy-mzwjj 1/1 Running 0 2m50s 192.168.39.182 minikube-m02 kube-system kube-scheduler-minikube 1/1 Running 0 3m27s 192.168.39.118 minikube kube-system kubevirt-install-manager 1/1 Running 0 26s 10.244.1.2 minikube-m02 kube-system storage-provisioner 1/1 Running 1 3m32s 192.168.39.118 minikube
sharifelgamal commented 3 years ago

@cwilkers we've done some work to fix networking for multinode clusters, can you try again this with the newest version of minikube and see if it still persists?

cwilkers commented 3 years ago

Unfortunately, I see the same behavior with v1.22.0.

$ minikube version
minikube version: v1.22.0
commit: a03fbcf166e6f74ef224d4a63be4277d017bb62e
$ minikube start --cni=flannel --nodes=2
😄  minikube v1.22.0 on Fedora 34
✨  Automatically selected the kvm2 driver. Other choices: podman, none, ssh
💾  Downloading driver docker-machine-driver-kvm2:
    > docker-machine-driver-kvm2....: 65 B / 65 B [----------] 100.00% ? p/s 0s
    > docker-machine-driver-kvm2: 11.47 MiB / 11.47 MiB  100.00% 343.09 MiB p/s
💿  Downloading VM boot image ...
    > minikube-v1.22.0.iso.sha256: 65 B / 65 B [-------------] 100.00% ? p/s 0s
    > minikube-v1.22.0.iso: 242.95 MiB / 242.95 MiB  100.00% 98.80 MiB p/s 2.7s
👍  Starting control plane node minikube in cluster minikube
💾  Downloading Kubernetes v1.21.2 preload ...
    > preloaded-images-k8s-v11-v1...: 502.14 MiB / 502.14 MiB  100.00% 84.82 Mi
🔥  Creating kvm2 VM (CPUs=2, Memory=2200MB, Disk=20000MB) ...
🐳  Preparing Kubernetes v1.21.2 on Docker 20.10.6 ...
    ▪ Generating certificates and keys ...
    ▪ Booting up control plane ...
    ▪ Configuring RBAC rules ...
🔗  Configuring Flannel (Container Networking Interface) ...
🔎  Verifying Kubernetes components...
    ▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🌟  Enabled addons: storage-provisioner, default-storageclass

👍  Starting node minikube-m02 in cluster minikube
🔥  Creating kvm2 VM (CPUs=2, Memory=2200MB, Disk=20000MB) ...
🌐  Found network options:
    ▪ NO_PROXY=192.168.39.50
🐳  Preparing Kubernetes v1.21.2 on Docker 20.10.6 ...
    ▪ env NO_PROXY=192.168.39.50
🔎  Verifying Kubernetes components...
🏄  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default
$ kubectl get po -A -o wide
NAMESPACE     NAME                               READY   STATUS    RESTARTS   AGE   IP               NODE           NOMINATED NODE   READINESS GATES
kube-system   coredns-558bd4d5db-hvnqx           1/1     Running   0          60s   10.88.0.2        minikube       <none>           <none>
kube-system   etcd-minikube                      1/1     Running   0          67s   192.168.39.50    minikube       <none>           <none>
kube-system   kube-apiserver-minikube            1/1     Running   0          67s   192.168.39.50    minikube       <none>           <none>
kube-system   kube-controller-manager-minikube   1/1     Running   0          67s   192.168.39.50    minikube       <none>           <none>
kube-system   kube-flannel-ds-amd64-4sbx9        1/1     Running   0          61s   192.168.39.50    minikube       <none>           <none>
kube-system   kube-flannel-ds-amd64-h7h58        1/1     Running   0          33s   192.168.39.236   minikube-m02   <none>           <none>
kube-system   kube-proxy-698zf                   1/1     Running   0          33s   192.168.39.236   minikube-m02   <none>           <none>
kube-system   kube-proxy-z2r66                   1/1     Running   0          61s   192.168.39.50    minikube       <none>           <none>
kube-system   kube-scheduler-minikube            1/1     Running   0          67s   192.168.39.50    minikube       <none>           <none>
kube-system   storage-provisioner                1/1     Running   0          73s   192.168.39.50    minikube       <none>           <none>
$ minikube addons enable kubevirt
    ▪ Using image bitnami/kubectl:1.17
🌟  The 'kubevirt' addon is enabled
$ kubectl -n kube-system get po kubevirt-install-manager -o wide
NAME                       READY   STATUS    RESTARTS   AGE     IP           NODE           NOMINATED NODE   READINESS GATES
kubevirt-install-manager   1/1     Running   0          5m10s   10.244.1.2   minikube-m02   <none>           <none>
$ kubectl -n kube-system logs kubevirt-install-manager

error: the path "/manifests/kubevirt-base.yaml" does not exist
error: the path "/manifests/kubevirt.yaml" does not exist
$ kubectl -n kube-system exec -ti kubevirt-install-manager bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
I have no name!@kubevirt-install-manager:/$ curl http://github.com/
curl: (6) Could not resolve host: github.com
sharifelgamal commented 2 years ago

Well, this seems to continue being an error we're facing so I'll add it as a bug and investigate when I have time. Help is of course welcome as well.

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

sharifelgamal commented 2 years ago

I'm not sure if this remains an issue, but I want to keep it open to investigate.

cwilkers commented 2 years ago

I haven't verified it lately, but it is likely still an issue. I would like to attempt a fix, but haven't had time to start from scratch on the minikube build system.

ShiroDN commented 2 years ago

Yes, looks like it's still an issue.

$ minikube version
minikube version: v1.25.2
commit: 362d5fdc0a3dbee389b3d3f1034e8023e72bd3a7-dirty
$ minikube start --kubernetes-version=v1.23.4 --nodes=3 --cni=flannel
$ kubectl get po -A -o wide
NAMESPACE     NAME                               READY   STATUS    RESTARTS      AGE   IP               NODE           NOMINATED NODE   READINESS GATES
kube-system   coredns-64897985d-mkch2            1/1     Running   0             74s   10.88.0.2        minikube       <none>           <none>
kube-system   etcd-minikube                      1/1     Running   0             82s   192.168.39.41    minikube       <none>           <none>
kube-system   kube-apiserver-minikube            1/1     Running   0             88s   192.168.39.41    minikube       <none>           <none>
kube-system   kube-controller-manager-minikube   1/1     Running   0             82s   192.168.39.41    minikube       <none>           <none>
kube-system   kube-flannel-ds-amd64-5gmcp        1/1     Running   0             74s   192.168.39.41    minikube       <none>           <none>
kube-system   kube-flannel-ds-amd64-cp6ql        1/1     Running   0             13s   192.168.39.66    minikube-m03   <none>           <none>
kube-system   kube-flannel-ds-amd64-nn5dl        1/1     Running   0             49s   192.168.39.177   minikube-m02   <none>           <none>
kube-system   kube-proxy-7wlfv                   1/1     Running   0             13s   192.168.39.66    minikube-m03   <none>           <none>
kube-system   kube-proxy-8ncpp                   1/1     Running   0             49s   192.168.39.177   minikube-m02   <none>           <none>
kube-system   kube-proxy-g45dm                   1/1     Running   0             74s   192.168.39.41    minikube       <none>           <none>
kube-system   kube-scheduler-minikube            1/1     Running   0             81s   192.168.39.41    minikube       <none>           <none>
kube-system   storage-provisioner                1/1     Running   1 (72s ago)   85s   192.168.39.41    minikube       <none>           <none>
$ kubectl run nginx --image=nginx
pod/nginx created
$ kubectl exec -ti nginx -- bash
root@nginx:/# curl google.com
curl: (6) Could not resolve host: google.com
$ kubectl delete pod -n kube-system coredns-64897985d-mkch2
pod "coredns-64897985d-mkch2" deleted
[tomas@yggdrasil:~]$ kubectl get po -A -o wide
NAMESPACE     NAME                               READY   STATUS    RESTARTS      AGE     IP               NODE           NOMINATED NODE   READINESS GATES
default       nginx                              1/1     Running   0             8m47s   10.244.1.2       minikube-m02   <none>           <none>
kube-system   coredns-64897985d-p9h7j            1/1     Running   0             89s     10.244.1.3       minikube-m02   <none>           <none>
kube-system   etcd-minikube                      1/1     Running   0             11m     192.168.39.41    minikube       <none>           <none>
kube-system   kube-apiserver-minikube            1/1     Running   0             11m     192.168.39.41    minikube       <none>           <none>
.
.
.
$ kubectl exec -ti nginx -- bash
root@nginx:/# curl google.com
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>
cwilkers commented 2 years ago

I just verified the behavior on 1.26.1, and am looking into the cause again to see if this is something I could help fix.

afbjorklund commented 2 years ago

We need another workaround, since moving the CNI config got complicated with Kubernetes 1.24

If it is related to /etc/cni/net.d not being empty, that is ? Suggestion is to move podman and cri-o out...

afbjorklund commented 2 years ago

Basically Kubernetes doesn't work, unless you remove all other software from the (shared) configuration.

That is especially true for flannel, which normally doesn't install on host - but bootstraps from containers

cwilkers commented 2 years ago

If we cannot reorder it, would it be acceptable to conditionally delete the coredns container as the last part of applying the fabric?

cwilkers commented 2 years ago

@afbjorklund I might need a little help with this; I'm able to come up with Go code to check for the status.podIP in the coredns pod, but I haven't figured out the right place in the start code to insert this. Each place I try, the coredns pod either does not yet exist, or does not have an IP yet.

cwilkers commented 2 years ago

Here's an alternate idea, but it would require changes to coredns:

We could propose to add code in coredns's startup that identifies whether it gets an IP from the pod CIDR, and exit after a timeout if its IP doesn't match. To avoid breaking Kubernetes writ large, we could put this functionality behind a feature gate or environment variable that minikube could make use of.

pavgup commented 1 year ago

Just confirming this still appears to be broken and noting for folks that run into this that the suggested workaround in the kubevirt docs to delete coredns and disable/enable the kubevirt addon doesn't seem to be working.