kubernetes-sigs / kind

Kubernetes IN Docker - local clusters for testing Kubernetes
https://kind.sigs.k8s.io/
Apache License 2.0
13.32k stars 1.54k forks source link

coredns CrashLoopBackOff on Ubuntu 20.04 #1975

Closed paolomainardi closed 3 years ago

paolomainardi commented 3 years ago

What happened:

❯ k get pods
NAME                                                       READY   STATUS             RESTARTS   AGE
coredns-66bff467f8-j5cf4                                   0/1     CrashLoopBackOff   1          54s
coredns-66bff467f8-l6gtz                                   0/1     CrashLoopBackOff   1          54s
etcd-retrogames-k8s-dev-control-plane                      1/1     Running            0          66s
kindnet-wgxw8                                              1/1     Running            0          54s
kube-apiserver-retrogames-k8s-dev-control-plane            1/1     Running            0          66s
kube-controller-manager-retrogames-k8s-dev-control-plane   1/1     Running            0          66s
kube-proxy-nnkwz                                           1/1     Running            0          54s
kube-scheduler-retrogames-k8s-dev-control-plane            1/1     Running            0          66s

What you expected to happen:

Coredns should work.

How to reproduce it (as minimally and precisely as possible):

❯ cat /etc/issue
Ubuntu 20.04.1 LTS \n \l
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  image: kindest/node:v1.19.1@sha256:98cf5288864662e37115e362b23e4369c8c4a408f99cbc06e58ac30ddc721600
  kubeadmConfigPatches:
  - |
    kind: InitConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        node-labels: "ingress-ready=true"
  extraPortMappings:
  - containerPort: 80
    hostPort: 80
    protocol: TCP
  - containerPort: 443
    hostPort: 443
    protocol: TCP
❯ docker exec -it retrogames-k8s-dev-control-plane cat /etc/resolv.conf
search homenet.telecomitalia.it
nameserver 127.0.0.1
options ndots:0

Environment:

Server: Containers: 1 Running: 1 Paused: 0 Stopped: 0 Images: 2 Server Version: 19.03.8 Storage Driver: overlay2 Backing Filesystem: Supports d_type: true Native Overlay Diff: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: runc version: init version: Security Options: apparmor seccomp Profile: default Kernel Version: 5.4.0-56-generic Operating System: Ubuntu 20.04.1 LTS OSType: linux Architecture: x86_64 CPUs: 8 Total Memory: 15.33GiB Name: spark-carbon-cto ID: 2VUG:W4M7:ONOJ:GABB:KWWA:KILS:KLAA:RJLE:MOCY:YGB2:L6H6:VYP3 Docker Root Dir: /var/lib/docker Debug Mode: false Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false

WARNING: No swap limit support


- OS: 

❯ cat /etc/os-release
NAME="Ubuntu" VERSION="20.04.1 LTS (Focal Fossa)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 20.04.1 LTS" VERSION_ID="20.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=focal UBUNTU_CODENAME=focal

BenTheElder commented 3 years ago

I run kind on ubuntu regularly, can you share the coreDNS logs and any details about your networking environment?

paolomainardi commented 3 years ago

Hi @BenTheElder sure, i guess it's something related to the network where i am attached, because in another network now is working fine, i have to double check it and i'll come back here.

aojea commented 3 years ago

nameserver 127.0.0.1

Coredns can not have a loopback address as resolver or it will fail to run https://github.com/coredns/coredns/issues/2391

BenTheElder commented 3 years ago

@aojea good point, though that resolv.conf is inside the node container though, docker already does not pass through a loopback resolver from the host and the kind image is responsible for changing it to not-loopback* inside the container.

That shouldn't happen, but it's not clear why it happens.

BenTheElder commented 3 years ago

this definitely smells like a bug in kind but it's going to be hard to reproduce, the image appears to be the default image in 0.9 so that should have all the correct logic built-in to handle replacing 127.0.0.1 correctly. something funny is up.

paolomainardi commented 3 years ago

It happened again today, the same network of the other time, which has nothing strange in it.

CoreDNS logs:

❯ k logs -f coredns-f9fd979d6-lv8gv
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
[FATAL] plugin/loop: Loop (127.0.0.1:34649 -> :53) detected for zone ".", see https://coredns.io/plugins/loop#troubleshooting. Query: "HINFO 7716277109628538183.1999325155750113290."

Network dump:

❯ ip a 
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp0s31f6: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 1000
    link/ether 98:fa:9b:74:90:98 brd ff:ff:ff:ff:ff:ff
3: wlp0s20f3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 90:78:41:39:f2:c9 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.208/24 brd 192.168.1.255 scope global dynamic noprefixroute wlp0s20f3
       valid_lft 14333sec preferred_lft 14333sec
    inet6 fe80::b37b:d182:1601:d506/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
4: br-4362873eaa40: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:d9:c7:87:63 brd ff:ff:ff:ff:ff:ff
    inet 172.18.0.1/16 brd 172.18.255.255 scope global br-4362873eaa40
       valid_lft forever preferred_lft forever
    inet6 fc00:f853:ccd:e793::1/64 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::42:d9ff:fec7:8763/64 scope link 
       valid_lft forever preferred_lft forever
    inet6 fe80::1/64 scope link 
       valid_lft forever preferred_lft forever
5: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:48:6e:b0:81 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:48ff:fe6e:b081/64 scope link 
       valid_lft forever preferred_lft forever
173: vethd8960cc@if172: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-4362873eaa40 state UP group default 
    link/ether ee:3d:03:b8:2d:c3 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::ec3d:3ff:feb8:2dc3/64 scope link 
       valid_lft forever preferred_lft forever

CoreDNS configmap:

❯ k get configmap coredns -o yaml
apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors
        health {
           lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
           ttl 30
        }
        prometheus :9153
        forward . /etc/resolv.conf {
           max_concurrent 1000
        }
        cache 30
        loop
        reload
        loadbalance
    }
kind: ConfigMap
metadata:
  creationTimestamp: "2020-12-19T11:31:45Z"
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:data:
        .: {}
        f:Corefile: {}
    manager: kubeadm
    operation: Update
    time: "2020-12-19T11:31:45Z"
  name: coredns
  namespace: kube-system
  resourceVersion: "179"
  selfLink: /api/v1/namespaces/kube-system/configmaps/coredns
  uid: e106f5ff-442d-4771-a15a-90b4229b52bc

resolv.conf

❯ cat /etc/resolv.conf 
# This file is managed by man:systemd-resolved(8). Do not edit.
#
# This is a dynamic resolv.conf file for connecting local clients to the
# internal DNS stub resolver of systemd-resolved. This file lists all
# configured search domains.
#
# Run "resolvectl status" to see details about the uplink DNS servers
# currently in use.
#
# Third party programs must not access this file directly, but only through the
# symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a different way,
# replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.

nameserver 127.0.0.53
options edns0 trust-ad
search homenet.telecomitalia.it
BenTheElder commented 3 years ago

That appears to be your host resolv.conf, that's fine, what is the one inside the nodes? Can you inspect the docker network as well?

Can you describe more about what occurs when this happens? You switch networks in some way on the host and then create a cluster which fails or?

BenTheElder commented 3 years ago

We should also check the node container logs, there's some code in the entry point that tweaks etc/resolv.conf. it seems like perhaps that's failing somehow.

BenTheElder commented 3 years ago

What should be happening:

Because we use a user defined docker network:

Now because coreDNS will rightfully error because it runs in a nested container with another distinct looopback so;

That last bit seems to not be behaving somehow. But the entrypoint should fail if it fails 🤔

BenTheElder commented 3 years ago

I wonder if maybe during network switch docker is clobbering the resolv.conf, and it's propagating to coreDNS?

If that's the case, we could maybe solve this by ensuring we periodically fix the resolv.conf the way we do in the entrypoint with a background task (or more complex we could try to inotify watch for changes).

juusujanar commented 3 years ago

Doesn't seem to be Ubuntu-specific issue, as I have the same issue on Fedora 33 (with cgroups v2). Wanted to run Kind for the first time. Using same image as OP kindest/node:v1.19.1@sha256:98cf5288864662e37115e362b23e4369c8c4a408f99cbc06e58ac30ddc721600

I collected as much information about the environment, logs, configuration etc as I could and put them in this gist: https://gist.github.com/juusujanar/a9b9918a4d6d0dc5e101b0b191ec0a00

Let me know if you need any more information, would love to get this working.

BenTheElder commented 3 years ago

sorry there's been a lot going on and I missed / forgot this until a related issue was filed 😞

[FATAL] plugin/loop: Loop (127.0.0.1:53003 -> :53) detected for zone ".", see https://coredns.io/plugins/loop#troubleshooting. Query: "HINFO 1548797846507714410.3858146172311417690."

huhhhh so that's weird, we expect to see 127.0.0.11:53 inside the node, seems like something rewrote the container resolv.conf 🤔

https://gist.github.com/juusujanar/a9b9918a4d6d0dc5e101b0b191ec0a00#file-container-resolv-conf

BenTheElder commented 3 years ago

if someone could capture and upload/zip the kind export logs on a cluster where this happened, that would be helpful.

aojea commented 3 years ago

I found interesting that both reported resolv.conf has this option set

options edns0 trust-ad 

I'm not sure but I don't think that docker is modifying that, are you guys modifying the images, running apt-get upgrade or something out of the normal workflow?

juusujanar commented 3 years ago

@BenTheElder Hi, thanks for looking into it.

I have created a new cluster from scratch on my system and issue is still the same. CoreDNS configuration MD5 is the same.

Kind export can be downloaded from here: https://cloud.juusujanar.eu/index.php/s/CDGadQRcgm73t2a

➜  ~ GO111MODULE="on" go get sigs.k8s.io/kind@v0.10.0 && kind create cluster
go: downloading sigs.k8s.io/kind v0.10.0
go: downloading k8s.io/apimachinery v0.19.2
go: downloading golang.org/x/sys v0.0.0-20200928205150-006507a75852
go: downloading gopkg.in/yaml.v2 v2.2.8
go: downloading github.com/pelletier/go-toml v1.8.1
go: downloading github.com/evanphx/json-patch v4.9.0+incompatible
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.20.2) 🖼 
 ✓ Preparing nodes 📦  
 ✓ Writing configuration 📜 
 ✓ Starting control-plane 🕹️ 
 ✓ Installing CNI 🔌 
 ✓ Installing StorageClass 💾 
Set kubectl context to "kind-kind"
You can now use your cluster with:

kubectl cluster-info --context kind-kind

Not sure what to do next? 😅  Check out https://kind.sigs.k8s.io/docs/user/quick-start/
➜  ~ docker ps
CONTAINER ID   IMAGE                              COMMAND                  CREATED              STATUS                 PORTS                       NAMES
654d3f0cb42a   kindest/node:v1.20.2               "/usr/local/bin/entr…"   About a minute ago   Up About a minute      127.0.0.1:36157->6443/tcp   kind-control-plane
➜  ~ kubectl cluster-info --context kind-kind
Kubernetes master is running at https://127.0.0.1:36157
KubeDNS is running at https://127.0.0.1:36157/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
➜  ~ kubectl --context kind-kind get pods -A
NAMESPACE            NAME                                         READY   STATUS             RESTARTS   AGE
kube-system          coredns-74ff55c5b-t4v5k                      0/1     CrashLoopBackOff   1          39s
kube-system          coredns-74ff55c5b-wdvgf                      0/1     CrashLoopBackOff   1          39s
kube-system          etcd-kind-control-plane                      0/1     Running            0          47s
kube-system          kindnet-d9jvw                                1/1     Running            0          40s
kube-system          kube-apiserver-kind-control-plane            1/1     Running            0          47s
kube-system          kube-controller-manager-kind-control-plane   0/1     Running            0          47s
kube-system          kube-proxy-nxv2r                             1/1     Running            0          40s
kube-system          kube-scheduler-kind-control-plane            0/1     Running            0          47s
local-path-storage   local-path-provisioner-78776bfc44-dp24j      1/1     Running            0          39s

EDIT: I tested same installation method on a VM running Debian 10 (kernel 4.19.0-14-amd64) with Docker 20.10.3 and CoreDNS started okay there.

I did not modify the images, just a clean install right now.

EDIT2: Disabled systemd-resolved and resorted to NetworkManager-handled DNS (host /etc/resolv.conf config below), then CoreDNS started just fine. Found this resource, that says loops happen when host runs a local DNS cache: https://github.com/coredns/coredns/blob/master/plugin/loop/README.md

# Generated by NetworkManager
nameserver 1.1.1.1
nameserver 1.0.0.1

CoreDNS has the following resolv.conf file:

nameserver 172.18.0.1
options ndots:0
BenTheElder commented 3 years ago

Sorry I've been snowed under with Kubernetes code freeze reviews + performance evals at work. Thank you for providing all this detail!

So to be clear: disabling systemd-resolved on your host solved this for you? It does appear that my usual linux hosts are not running systemd-resolved.

Kind / Docker are supposed to prevent the loops from happening, but I'm wondering if /etc/resolv.conf in the container is getting updated on the fly when running systemd-resolved, clobbering our changes, and we're just not encountering that issue in other environments.

If that's the case, I think we can work around this.

aojea commented 3 years ago

Kind / Docker are supposed to prevent the loops from happening, but I'm wondering if /etc/resolv.conf in the container is getting updated on the fly when running systemd-resolved, clobbering our changes, and we're just not encountering that issue in other environments.

how does this thing works? who mounts the resolv.conf inside the container?

root@kind-control-plane:/# systemctl  status etc-resolv.conf.mount
● etc-resolv.conf.mount - /etc/resolv.conf
     Loaded: loaded (/proc/self/mountinfo)
     Active: active (mounted) since Wed 2021-03-10 17:37:02 UTC; 21h ago
      Where: /etc/resolv.conf
       What: /dev/sdb1
      Tasks: 0 (limit: 38120)
     Memory: 0B
     CGroup: /docker/82a036fa6f5b7bba7949138311efcac6493f62da97f0feb1049c0894e1df362f/system.slice/etc-resolv.conf.mount
digitole commented 3 years ago

Hi!

I also got this issue, on macOS 11.2.3 though. Using the image "kindest/node:v1.17.0".

The logs from coredns pod:

.:53
[INFO] plugin/reload: Running configuration MD5 = 4e235fcc3696966e76816bcd9034ebc7
CoreDNS-1.6.5
linux/amd64, go1.13.4, c2fd1b2
[FATAL] plugin/loop: Loop (127.0.0.1:45572 -> :53) detected for zone ".", see https://coredns.io/plugins/loop#troubleshooting. Query: "HINFO 6117152247463384669.1586745817635660726."

As you can see it's 127.0.0.1, but in the resolv.conf in the docker container, it says:

$ docker exec -it kind-control-plane sh
# cat /etc/resolv.conf
nameserver 127.0.0.11
options ndots:0

I edited the coredns configmap to forward to IP address 8.8.8.8 instead, which solved the issue. But it's not a proper fix of course.

- forward . /etc/resolv.conf
+ forward . 8.8.8.8

Let me know if I can help with any logs etc.

BenTheElder commented 3 years ago

Using arbitrary images with arbitrary kind releases is NOT supported and the reason for this issue on mac @digitole, v1.17.0 is an image we pushed with kind v0.7.0 which is too old to support the updated networking. That is not related to this systemd-resolved issue.

More in the release notes: https://github.com/kubernetes-sigs/kind/releases/tag/v0.8.0#breaking-changes

KIND v0.8.0 requires node images built with v0.8.0+.

Images supported for v0.10.0 (current stable) specifically: https://github.com/kubernetes-sigs/kind/releases/tag/v0.10.0#new-features

ste93cry commented 3 years ago

I have the same issue when creating a new cluster on Ubuntu 20.10:

$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.10 (Groovy Gorilla)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.10"
VERSION_ID="20.10"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=groovy
UBUNTU_CODENAME=groovy
$ kind version
kind v0.10.0 go1.15.7 linux/amd64`
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.4", GitCommit:"e87da0bd6e03ec3fea7933c4b5263d151aafd07c", GitTreeState:"clean", BuildDate:"2021-02-20T02:22:41Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2", GitCommit:"faecb196815e248d3ecfb03c680a4507229c2a56", GitTreeState:"clean", BuildDate:"2021-01-21T01:11:42Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
$ docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.5.1-docker)

Server:
 Containers: 59
  Running: 47
  Paused: 0
  Stopped: 12
 Images: 126
 Server Version: 20.10.5
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 05f951a3781f4f2c1911b05e61c160e9c30eaa8e
 runc version: 12644e614e25b05da6fd08a38ffa0cfe1903fdec
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.8.0-48-generic
 Operating System: Ubuntu 20.10
 OSType: linux
 Architecture: x86_64
 CPUs: 12
 Total Memory: 31GiB
 Name: LAPTOP-STEFANO
 ID: Z55C:5DBZ:QVDX:5MJ5:XNWD:HQCN:ALK5:QMD6:RRKM:S75L:UTVK:XTK5
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Username: stefanoarlandini
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
$ docker exec -it kind-control-plane cat /etc/resolv.conf
search homenet.telecomitalia.it
nameserver 127.0.0.1
nameserver 2001:4860:4860::8888
nameserver 2001:4860:4860::8844
options edns0 trust-ad ndots:0
$ docker inspect kind-control-plane
[
    {
        "Id": "f18aaef3b7c1ea39565d2f2efe960ecb7e6cf25ef9f9779d6968cb2302022c3d",
        "Created": "2021-03-26T16:35:05.321794727Z",
        "Path": "/usr/local/bin/entrypoint",
        "Args": [
            "/sbin/init"
        ],
        "State": {
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 99695,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "2021-03-26T16:35:07.590680832Z",
            "FinishedAt": "0001-01-01T00:00:00Z"
        },
        "Image": "sha256:094599011731a3d022da48d9d83fdcf7c3113cd3bf3d7261265e2cad222e7263",
        "ResolvConfPath": "/var/lib/docker/containers/f18aaef3b7c1ea39565d2f2efe960ecb7e6cf25ef9f9779d6968cb2302022c3d/resolv.conf",
        "HostnamePath": "/var/lib/docker/containers/f18aaef3b7c1ea39565d2f2efe960ecb7e6cf25ef9f9779d6968cb2302022c3d/hostname",
        "HostsPath": "/var/lib/docker/containers/f18aaef3b7c1ea39565d2f2efe960ecb7e6cf25ef9f9779d6968cb2302022c3d/hosts",
        "LogPath": "/var/lib/docker/containers/f18aaef3b7c1ea39565d2f2efe960ecb7e6cf25ef9f9779d6968cb2302022c3d/f18aaef3b7c1ea39565d2f2efe960ecb7e6cf25ef9f9779d6968cb2302022c3d-json.log",
        "Name": "/kind-control-plane",
        "RestartCount": 0,
        "Driver": "overlay2",
        "Platform": "linux",
        "MountLabel": "",
        "ProcessLabel": "",
        "AppArmorProfile": "unconfined",
        "ExecIDs": null,
        "HostConfig": {
            "Binds": [
                "/lib/modules:/lib/modules:ro"
            ],
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "json-file",
                "Config": {}
            },
            "NetworkMode": "kind",
            "PortBindings": {
                "6443/tcp": [
                    {
                        "HostIp": "127.0.0.1",
                        "HostPort": "46315"
                    }
                ]
            },
            "RestartPolicy": {
                "Name": "on-failure",
                "MaximumRetryCount": 1
            },
            "AutoRemove": false,
            "VolumeDriver": "",
            "VolumesFrom": null,
            "CapAdd": null,
            "CapDrop": null,
            "CgroupnsMode": "host",
            "Dns": [],
            "DnsOptions": [],
            "DnsSearch": [],
            "ExtraHosts": null,
            "GroupAdd": null,
            "IpcMode": "private",
            "Cgroup": "",
            "Links": null,
            "OomScoreAdj": 0,
            "PidMode": "",
            "Privileged": true,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": [
                "seccomp=unconfined",
                "apparmor=unconfined",
                "label=disable"
            ],
            "Tmpfs": {
                "/run": "",
                "/tmp": ""
            },
            "UTSMode": "",
            "UsernsMode": "",
            "ShmSize": 67108864,
            "Runtime": "runc",
            "ConsoleSize": [
                0,
                0
            ],
            "Isolation": "",
            "CpuShares": 0,
            "Memory": 0,
            "NanoCpus": 0,
            "CgroupParent": "",
            "BlkioWeight": 0,
            "BlkioWeightDevice": [],
            "BlkioDeviceReadBps": null,
            "BlkioDeviceWriteBps": null,
            "BlkioDeviceReadIOps": null,
            "BlkioDeviceWriteIOps": null,
            "CpuPeriod": 0,
            "CpuQuota": 0,
            "CpuRealtimePeriod": 0,
            "CpuRealtimeRuntime": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": [],
            "DeviceCgroupRules": null,
            "DeviceRequests": null,
            "KernelMemory": 0,
            "KernelMemoryTCP": 0,
            "MemoryReservation": 0,
            "MemorySwap": 0,
            "MemorySwappiness": null,
            "OomKillDisable": false,
            "PidsLimit": null,
            "Ulimits": null,
            "CpuCount": 0,
            "CpuPercent": 0,
            "IOMaximumIOps": 0,
            "IOMaximumBandwidth": 0,
            "MaskedPaths": null,
            "ReadonlyPaths": null
        },
        "GraphDriver": {
            "Data": {
                "LowerDir": "/var/lib/docker/overlay2/8aa2d8dc5292a1421adc30493d3c80e5e666f069e7e159f3ba0c551798d85c3e-init/diff:/var/lib/docker/overlay2/15b9ee657a5a03fe812a6f377355d9906034bd551ae6cb1cb8ee7191f31fcacf/diff:/var/lib/docker/overlay2/8c78be55c8cc33b9f7b65302d282dad6e59071192fbc6ccf0672b7645bd47a61/diff:/var/lib/docker/overlay2/d30ca99ca48670794e02333b23c69f446630fd82bf448f64424a58bbd416a0ff/diff:/var/lib/docker/overlay2/ae0c77d99e6c163ff78d7af24d13e6aa8b1e248dda19b5bc1403a7f01a3d064f/diff:/var/lib/docker/overlay2/1ee6fbcb49586758ca8f81672642e73963d87908219295e32bad1e95f6c6948c/diff:/var/lib/docker/overlay2/0d22adf13c16419fe79de35ac76a067cef94e9b31fcc622331d3a86192c44fb1/diff",
                "MergedDir": "/var/lib/docker/overlay2/8aa2d8dc5292a1421adc30493d3c80e5e666f069e7e159f3ba0c551798d85c3e/merged",
                "UpperDir": "/var/lib/docker/overlay2/8aa2d8dc5292a1421adc30493d3c80e5e666f069e7e159f3ba0c551798d85c3e/diff",
                "WorkDir": "/var/lib/docker/overlay2/8aa2d8dc5292a1421adc30493d3c80e5e666f069e7e159f3ba0c551798d85c3e/work"
            },
            "Name": "overlay2"
        },
        "Mounts": [
            {
                "Type": "bind",
                "Source": "/lib/modules",
                "Destination": "/lib/modules",
                "Mode": "ro",
                "RW": false,
                "Propagation": "rprivate"
            },
            {
                "Type": "volume",
                "Name": "aea6b97a38cfb569882c80d32da91e8bc0827c39b39d9818fb1de2e2990adaa1",
                "Source": "/var/lib/docker/volumes/aea6b97a38cfb569882c80d32da91e8bc0827c39b39d9818fb1de2e2990adaa1/_data",
                "Destination": "/var",
                "Driver": "local",
                "Mode": "",
                "RW": true,
                "Propagation": ""
            }
        ],
        "Config": {
            "Hostname": "kind-control-plane",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "ExposedPorts": {
                "6443/tcp": {}
            },
            "Tty": true,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                "container=docker"
            ],
            "Cmd": null,
            "Image": "kindest/node:v1.20.2@sha256:8f7ea6e7642c0da54f04a7ee10431549c0257315b3a634f6ef2fecaaedb19bab",
            "Volumes": {
                "/var": {}
            },
            "WorkingDir": "",
            "Entrypoint": [
                "/usr/local/bin/entrypoint",
                "/sbin/init"
            ],
            "OnBuild": null,
            "Labels": {
                "io.x-k8s.kind.cluster": "kind",
                "io.x-k8s.kind.role": "control-plane"
            },
            "StopSignal": "SIGRTMIN+3"
        },
        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "c7bc1ad18095fa077ab2dd5fc8a635ad6a6dde4bc05ba20f94b01eea0886fcd6",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {
                "6443/tcp": [
                    {
                        "HostIp": "127.0.0.1",
                        "HostPort": "46315"
                    }
                ]
            },
            "SandboxKey": "/var/run/docker/netns/c7bc1ad18095",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "",
            "Gateway": "",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "",
            "IPPrefixLen": 0,
            "IPv6Gateway": "",
            "MacAddress": "",
            "Networks": {
                "kind": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": [
                        "f18aaef3b7c1",
                        "kind-control-plane"
                    ],
                    "NetworkID": "c5459c606fdf1a2c071255b83e49767e7da8405df2019005f55b0f0c4bb5d719",
                    "EndpointID": "2bd2e0e56207a951f8b60ed6a42392f71643c4b2a91458d82f7052d4c6e0a801",
                    "Gateway": "172.19.0.1",
                    "IPAddress": "172.19.0.2",
                    "IPPrefixLen": 16,
                    "IPv6Gateway": "fc00:f853:ccd:e793::1",
                    "GlobalIPv6Address": "fc00:f853:ccd:e793::2",
                    "GlobalIPv6PrefixLen": 64,
                    "MacAddress": "02:42:ac:13:00:02",
                    "DriverOpts": null
                }
            }
        }
    }
]

Attached here you can find the logs. While trying to understand why 127.0.0.1 is written inside the resolv.conf file in the container, I noticed that when running /usr/local/bin/entrypoint script, the command getent ahostsv4 'host.docker.internal' | head -n1 | cut -d' ' -f1 returns 127.0.0.1 and that's what gets written in the file

BenTheElder commented 3 years ago

Whoah interesting. Thanks. That is not expected, on linux we naively expect this to not resolve because it's something docker desktop app sets up. We will clearly need to rethink that bit.

aojea commented 3 years ago

can you build a kind image with this PR and see if it solves the problem? https://github.com/kubernetes-sigs/kind/pull/2165

I can't reproduce it, if you can't build it I can provide you one

ste93cry commented 3 years ago

Sorry if I arrive late, I was waiting tomorrow to test it out on the PC having the issue. However, as I see that PR got already merged, I assume there is no more need of testing the Kind image

aojea commented 3 years ago

Sorry if I arrive late, I was waiting tomorrow to test it out on the PC having the issue. However, as I see that PR got already merged, I assume there is no more need of testing the Kind image

The bot closes it automatically :) , I can't reproduce the issue and I could only test that if the ip is a loopback it uses the default network, but it will nice if you can confirm that there are no more hidden issues because of this behavior. We can always reopen, so I will reopen and wait for your confirmation

ste93cry commented 3 years ago

I see you bumped the node image to kindest/base:v20210328-c17ca167@sha256:0311870f4d35b0f68e2fedb5d703a552a8e7eb438acc67a3bd13982c2bda7487, can I use that for the test or do I still need to build it myself?

aojea commented 3 years ago

I see you bumped the node image to kindest/base:v20210328-c17ca167@sha256:0311870f4d35b0f68e2fedb5d703a552a8e7eb438acc67a3bd13982c2bda7487, can I use that for the test or do I still need to build it myself?

that is the base image used to build the node image, you need to build a new node image, if you see the command help

$ kind build node-image -h
Build the node image which contains Kubernetes build artifacts and other kind requirements

Usage:
  kind build node-image [flags]

Flags:
      --base-image string   name:tag of the base image to use for the build (default "kindest/base:v20210205-c0cffc8c")

you can specify this base image, or if you use kind from master it will use it by default

ste93cry commented 3 years ago

Good news: I just tried building the image and creating the cluster with it and it works, no more CoreDNS crashes :tada:

aojea commented 3 years ago

:heart: Thanks for confirming, great news, we can close this with a higher degree of confidence :) /close

k8s-ci-robot commented 3 years ago

@aojea: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/kind/issues/1975#issuecomment-809245415): >:heart: >Thanks for confirming, great news, we can close this with a higher degree of confidence :) >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
BenTheElder commented 3 years ago

thank you!