Fix core-dns issue for my local minikube cluster

vadasambar commented 1 year ago

Problem

task pods fail to access the code url in my local minikube cluster
turns out my core-dns is not working properly. I created a fresh new minikube cluster without lifecycle-controller and see the same problem.
re-installing minikube doesn't fix core-dns issue

vadasambar commented 1 year ago

This started happening recently

vadasambar commented 1 year ago

For `--driver=virtualbox`

After restarting the cluster:

Deleting the above cluster and creating a new `test2` cluster with `--driver=virtualbox`

I see the same timeout error as with clusters created using --driver=docker

Looking closer at the error:

[ERROR] plugin/errors: 2 2844461355748594668.4912547231313880590. HINFO: read udp 172.17.0.2:44213->192.168.121.1:53: i/o timeout

172.17.0.2 is the IP of core-dns pod

coredns-565d847f94-w7lcj        1/1     Running   0               5m10s   172.17.0.2      test2   <none>           <none>

I can't find what 192.168.121.1:53 is pointing to. When I try apt install curl in dnsutils pod I get the following error in the pod

Err http://security.debian.org/debian-security/ jessie/updates/main openssl amd64 1.0.1t-1+deb8u12
  Temporary failure resolving 'security.debian.org'

But the same error in the core-dns pod as before:

[ERROR] plugin/errors: 2 security.debian.org.domain.name. A: read udp 172.17.0.2:41111->192.168.121.1:53: i/o timeout

Seems like 192.168.121.1:53 is some sort or Egress? Pods reach out to core-dns via kube-dns service -> core-dns pod which has 172.17.0.2 IP

kube-dns service

suraj@suraj:~/sandbox$ k get svc kube-dns -oyaml
apiVersion: v1
kind: Service
metadata:
...
  labels:
    k8s-app: kube-dns
    kubernetes.io/cluster-service: "true"
    kubernetes.io/name: CoreDNS
  name: kube-dns
  namespace: kube-system
...
spec:
...
  selector:
    k8s-app: kube-dns

core-dns pod

suraj@suraj:~/sandbox$ k get po -owide 
NAME                            READY   STATUS    RESTARTS      AGE    IP              NODE    NOMINATED NODE   READINESS GATES
coredns-565d847f94-w7lcj        1/1     Running   0             15m    172.17.0.2      test2   <none>           <none>
dnsutils                        1/1     Running   0             7m6s   172.17.0.3      test2   <none>           <none>
...
suraj@suraj:~/sandbox$ k get po -owide coredns-565d847f94-w7lcj -oyaml | grep labels -A 5
  labels:
    k8s-app: kube-dns
    pod-template-hash: 565d847f94

192.168.121.1 is not minikube ip either

suraj@suraj:~/sandbox$ minikube ip  -ptest2
192.168.121.2

Found 192.168.121.1 host.minikube.internal in the coredns config

suraj@suraj:~/sandbox$ k get cm  coredns -oyaml
apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors
        health {
           lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
           ttl 30
        }
        prometheus :9153
        hosts {
           192.168.121.1 host.minikube.internal
           fallthrough
        }

Internet inside `minikube ssh` node works as expected

vadasambar commented 1 year ago

To make it easier to access your host, minikube v1.10 adds a hostname entry host.minikube.internal to /etc/hosts. The IP which host.minikube.internal resolves to is different across drivers, and may be different across clusters.

https://minikube.sigs.k8s.io/docs/handbook/host-access/

Seems like minikube has a way to access the host aka docker/vm inside which my cluster is running.

When I started the minikube cluster today, I saw the host IP changed in core-dns ConfigMap in kube-system namespace

...
        hosts {
           192.168.112.1 host.minikube.internal
           fallthrough
        }
...

This corresponds to the change in /etc/hosts on minikube ssh node

docker@klc4:~$ ip route
default via 192.168.112.1 dev eth0 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 
192.168.112.0/24 dev eth0 proto kernel scope link src 192.168.112.2 
docker@klc4:~$ cat /etc/hosts
127.0.0.1   localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
192.168.112.2   klc4
192.168.112.1   host.minikube.internal <- this
192.168.112.2   control-plane.minikube.internal

Ref: https://hugomartins.io/essays/2019/12/access-host-resources-minikube/

It seems like core-dns tries to access host IP when it wants to reach to a website out of cluster.

vadasambar commented 1 year ago

I can access the minikube host from dnsutils pod

root@dnsutils:/# ping -c 3 host.minikube.internal
PING host.minikube.internal (192.168.112.1) 56(84) bytes of data.
64 bytes from host.minikube.internal (192.168.112.1): icmp_seq=1 ttl=63 time=0.031 ms
64 bytes from host.minikube.internal (192.168.112.1): icmp_seq=2 ttl=63 time=0.028 ms
64 bytes from host.minikube.internal (192.168.112.1): icmp_seq=3 ttl=63 time=0.033 ms

--- host.minikube.internal ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2042ms
rtt min/avg/max/mdev = 0.028/0.030/0.033/0.006 ms

vadasambar commented 1 year ago

Looks like it might be worth deleting minikube and re-installing it all over again.

vadasambar commented 1 year ago

Deleted all the stopped and running clusters and the ~/.minikube folder

Started a new cluster with the default docker driver

suraj@suraj:~$ minikube start
😄  minikube v1.28.0 on Ubuntu 20.04
✨  Automatically selected the docker driver. Other choices: virtualbox, ssh
📌  Using Docker driver with root privileges
👍  Starting control plane node minikube in cluster minikube
🚜  Pulling base image ...
💾  Downloading Kubernetes v1.25.3 preload ...
    > preloaded-images-k8s-v18-v1...:  385.44 MiB / 385.44 MiB  100.00% 5.72 Mi
🔥  Creating docker container (CPUs=2, Memory=8000MB) ...
❗  This container is having trouble accessing https://registry.k8s.io
💡  To pull new external images, you may need to configure a proxy: https://minikube.sigs.k8s.io/docs/reference/networking/proxy/
🐳  Preparing Kubernetes v1.25.3 on Docker 20.10.20 ...
    ▪ Generating certificates and keys ...
    ▪ Booting up control plane ...
    ▪ Configuring RBAC rules ...
🔎  Verifying Kubernetes components...
    ▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🌟  Enabled addons: storage-provisioner, default-storageclass

❗  /home/suraj/bin/kubectl is version 1.23.1, which may have incompatibilities with Kubernetes 1.25.3.
    ▪ Want kubectl v1.25.3? Try 'minikube kubectl -- get pods -A'
🏄  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default

Still see the same issue

suraj@suraj:~$ k get po -nkube-system 
NAME                               READY   STATUS    RESTARTS   AGE
coredns-565d847f94-jxfnl           1/1     Running   0          30s
etcd-minikube                      1/1     Running   0          43s
kube-apiserver-minikube            1/1     Running   0          43s
kube-controller-manager-minikube   1/1     Running   0          43s
kube-proxy-mqjpf                   1/1     Running   0          31s
kube-scheduler-minikube            1/1     Running   0          43s
storage-provisioner                1/1     Running   0          40s

suraj@suraj:~$ k logs -f coredns-565d847f94-jxfnl -nkube-system
.:53
[INFO] plugin/reload: Running configuration SHA512 = 74073c0c68a507b50ca81d319bd4852e1242323807443dc549ab9f2fb21c8587977d5d9a7ecbfada54b5ff45c9b40d98fc730bfb6641b1b669d8fa8e6e9cea7f
CoreDNS-1.9.3
linux/amd64, go1.18.2, 45b0a11
[ERROR] plugin/errors: 2 925101847113008680.6913805637692685231. HINFO: read udp 172.17.0.2:43435->192.168.58.1:53: i/o timeout
[ERROR] plugin/errors: 2 925101847113008680.6913805637692685231. HINFO: read udp 172.17.0.2:51485->192.168.58.1:53: i/o timeout
[ERROR] plugin/errors: 2 925101847113008680.6913805637692685231. HINFO: read udp 172.17.0.2:45169->192.168.58.1:53: i/o timeout
[ERROR] plugin/errors: 2 925101847113008680.6913805637692685231. HINFO: read udp 172.17.0.2:46160->192.168.58.1:53: i/o timeout
[ERROR] plugin/errors: 2 925101847113008680.6913805637692685231. HINFO: read udp 172.17.0.2:57224->192.168.58.1:53: i/o timeout
[ERROR] plugin/errors: 2 925101847113008680.6913805637692685231. HINFO: read udp 172.17.0.2:55746->192.168.58.1:53: i/o timeout
[ERROR] plugin/errors: 2 925101847113008680.6913805637692685231. HINFO: read udp 172.17.0.2:54776->192.168.58.1:53: i/o timeout
[ERROR] plugin/errors: 2 925101847113008680.6913805637692685231. HINFO: read udp 172.17.0.2:55256->192.168.58.1:53: i/o timeout
[ERROR] plugin/errors: 2 925101847113008680.6913805637692685231. HINFO: read udp 172.17.0.2:51308->192.168.58.1:53: i/o timeout
[ERROR] plugin/errors: 2 925101847113008680.6913805637692685231. HINFO: read udp 172.17.0.2:60343->192.168.58.1:53: i/o timeout

vadasambar commented 1 year ago

Tried creating a k3d cluster to see if I see the same problem there too

suraj@suraj:~$ k3d cluster create test -p "8082:80@loadbalancer" --k3s-server-arg "--kube-proxy-arg=conntrack-max-per-core=0"  --k3s-agent-arg "--kube-proxy-arg=conntrack-max-per-core=0" --agents 3 --registry-create
INFO[0000] Prep: Network                                
INFO[0000] Created network 'k3d-test' (267ab77fb4b1018e09ead1df6a70bc703916341e069206fa9304a19b5d737c52) 
INFO[0000] Created volume 'k3d-test-images'             
INFO[0000] Creating node 'k3d-test-registry'            
INFO[0001] Successfully created registry 'k3d-test-registry' 
INFO[0002] Creating node 'k3d-test-server-0'            
INFO[0002] Creating node 'k3d-test-agent-0'             
INFO[0003] Creating node 'k3d-test-agent-1'             
INFO[0004] Creating node 'k3d-test-agent-2'             
INFO[0005] Creating LoadBalancer 'k3d-test-serverlb'    
INFO[0006] Starting cluster 'test'                      
INFO[0006] Starting servers...                          
INFO[0007] Starting Node 'k3d-test-server-0'            
INFO[0013] Starting agents...                           
INFO[0014] Starting Node 'k3d-test-agent-0'             
INFO[0024] Starting Node 'k3d-test-agent-1'             
INFO[0033] Starting Node 'k3d-test-agent-2'             
INFO[0041] Starting helpers...                          
INFO[0041] Starting Node 'k3d-test-registry'            
INFO[0042] Starting Node 'k3d-test-serverlb'            
INFO[0044] (Optional) Trying to get IP of the docker host and inject it into the cluster as 'host.k3d.internal' for easy access 
INFO[0054] Successfully added host record to /etc/hosts in 6/6 nodes and to the CoreDNS ConfigMap 
INFO[0055] Cluster 'test' created successfully!         
INFO[0055] --kubeconfig-update-default=false --> sets --kubeconfig-switch-context=false 
INFO[0056] You can now use it like this:                
kubectl config use-context k3d-test
kubectl cluster-info

core-dns doesn't seem to throw the same error for k3d

suraj@suraj:~$ kubens kube-system 
Context "k3d-test" modified.
Active namespace is "kube-system".
suraj@suraj:~$ k get po 
NAME                                      READY   STATUS              RESTARTS   AGE
local-path-provisioner-5ff76fc89d-pqhqb   1/1     Running             0          110s
coredns-854c77959c-6rn8b                  1/1     Running             0          110s
metrics-server-86cbb8457f-pvdnz           1/1     Running             0          110s
traefik-6f9cbd9bd4-q5n7k                  0/1     ContainerCreating   0          7s
svclb-traefik-ddqv2                       0/2     ContainerCreating   0          7s
svclb-traefik-gqt8g                       0/2     ContainerCreating   0          7s
svclb-traefik-s25ck                       0/2     ContainerCreating   0          7s
svclb-traefik-xscl7                       0/2     ContainerCreating   0          7s
helm-install-traefik-6tdwm                0/1     Completed           0          110s
suraj@suraj:~$ k logs -f coredns-854c77959c-6rn8b
.:53
[INFO] plugin/reload: Running configuration MD5 = 442b35f70385f5c97f2491a0ce8a27f6
CoreDNS-1.8.0
linux/amd64, go1.15.3, 054c9ae

vadasambar commented 1 year ago

Tried running lifecycle-toolkit on my local k3d cluster

NAME                                        READY   STATUS             RESTARTS   AGE
podtato-head-entry-64f9c674d9-ljwqr         0/1     Pending            0          3m54s
podtato-head-hat-6f5f6dd8f8-9x785           0/1     Pending            0          3m54s
podtato-head-left-leg-6d4879f7d8-jt6pn      0/1     Pending            0          3m54s
podtato-head-left-arm-74f6856758-lpls6      0/1     Pending            0          3m54s
podtato-head-right-leg-785c9b5965-jgjkz     0/1     Pending            0          3m54s
podtato-head-right-arm-567667c89b-zh44s     0/1     Pending            0          3m53s
klc-pre-check-entry-service-5-85667-hbv6d   0/1     CrashLoopBackOff   4          3m49s
klc-pre-check-entry-service-6-10960-7rh5x   0/1     CrashLoopBackOff   4          3m50s
klc-pre-check-entry-service-3-82816-wwdzt   0/1     CrashLoopBackOff   4          3m49s
klc-pre-check-entry-service-6-48522-wpbb8   0/1     CrashLoopBackOff   4          3m49s
klc-pre-check-entry-service-3-41677-59g6h   0/1     CrashLoopBackOff   4          3m49s

$ k describe po klc-pre-check-entry-service-5-85667-hbv6d
...
  Normal   Pulled     75s (x4 over 2m50s)  kubelet            Container image "ghcr.io/keptn/functions-runtime:v0.4.0" already present on machine
  Normal   Created    72s (x5 over 2m52s)  kubelet            Created container keptn-function-runner
  Normal   Started    72s (x5 over 2m52s)  kubelet            Started container keptn-function-runner
  Warning  BackOff    56s (x9 over 2m45s)  kubelet            Back-off restarting failed container

suraj@suraj:~/sandbox$ k logs -f klc-pre-check-entry-service-5-85667-hbv6d
Download https://raw.githubusercontent.com/keptn/lifecycle-toolkit/main/functions-runtime/samples/ts/http.ts
Could not fetch url

I am able to install curl in dnsutils pod

$ k exec -it dnsutils -- bash
root@dnsutils:/# curl google.com
bash: curl: command not found
root@dnsutils:/# apt install curl
...
Setting up curl (7.38.0-4+deb8u16) ...
...
$ root@dnsutils:/# curl google.com
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>

Networking seems to be working for dnsutils pod (which sits in the same namespace as the CrashLoopBackoff pods above

Started a thread in #keptn-app-lifecycle-wg channel on CNCF slack: https://cloud-native.slack.com/archives/C0470F49FB2/p1668403607546299

vadasambar commented 1 year ago

suraj@suraj:~/sandbox$ k get cm coredns -nkube-system -oyaml
apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors
        debug
        health {
           lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
           ttl 30
        }
        prometheus :9153
        hosts {
           192.168.58.1 host.minikube.internal
           fallthrough
        }
        forward . /etc/resolv.conf {
           max_concurrent 1000
        }
        cache 30
        loop
        reload
        loadbalance
    }

The hosts plugin is useful for serving zones from a /etc/hosts file. It serves from a preloaded file that exists on disk. It checks the file for changes and updates the zones accordingly. This plugin only supports A, AAAA, and PTR records. The hosts plugin can be used with readily available hosts files that block access to advertising servers.

The plugin reloads the content of the hosts file every 5 seconds. Upon reload, CoreDNS will use the new definitions. Should the file be deleted, any inlined content will continue to be served. When the file is restored, it will then again be used.

If you want to pass the request to the rest of the plugin chain if there is no match in the hosts plugin, you must specify the fallthrough option.

https://coredns.io/plugins/hosts/

Since the host is inlined above,

INLINE the hosts file contents inlined in Corefile. If there are any lines before fallthrough then all of them will be treated as the additional content for hosts file. The specified hosts file path will still be read but entries will be overridden.

https://coredns.io/plugins/hosts/#syntax

vadasambar commented 1 year ago

suraj@suraj:~/sandbox$ k exec -it dnsutils -- nslookup host.minikube.internal
Server:     10.96.0.10
Address:    10.96.0.10#53

** server can't find host.minikube.internal.domain.name: SERVFAIL

vadasambar commented 1 year ago

suraj@suraj:~/sandbox$ k exec -it dnsutils -- ping -c 3 host.minikube.internal 
PING host.minikube.internal (192.168.58.1) 56(84) bytes of data.
64 bytes from host.minikube.internal (192.168.58.1): icmp_seq=1 ttl=63 time=0.164 ms
64 bytes from host.minikube.internal (192.168.58.1): icmp_seq=2 ttl=63 time=0.030 ms
64 bytes from host.minikube.internal (192.168.58.1): icmp_seq=3 ttl=63 time=0.029 ms

--- host.minikube.internal ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2025ms
rtt min/avg/max/mdev = 0.029/0.074/0.164/0.063 ms

ping works but coredns prints errors

[ERROR] plugin/errors: 2 host.minikube.internal.domain.name. A: read udp 172.17.0.2:44665->192.168.58.1:53: i/o timeout
[ERROR] plugin/errors: 2 host.minikube.internal.domain.name. A: read udp 172.17.0.2:51268->192.168.58.1:53: i/o timeout
[ERROR] plugin/errors: 2 host.minikube.internal.domain.name. A: read udp 172.17.0.2:59899->192.168.58.1:53: i/o timeout

vadasambar commented 1 year ago

Created an issue in minikube repo: https://github.com/kubernetes/minikube/issues/15354

vadasambar commented 1 year ago

Downgrading my minikube to 1.24 to see if it has any effect. Same error :cry: for 1.24

suraj@suraj:~$ kubens kube-system
Context "124" modified.
Active namespace is "kube-system".
suraj@suraj:~$ k get po 
NAME                          READY   STATUS    RESTARTS        AGE
coredns-78fcd69978-6zt8j      1/1     Running   0               3m36s
etcd-124                      1/1     Running   0               3m49s
kube-apiserver-124            1/1     Running   0               3m49s
kube-controller-manager-124   1/1     Running   0               3m49s
kube-proxy-s7244              1/1     Running   0               3m37s
kube-scheduler-124            1/1     Running   0               3m49s
storage-provisioner           1/1     Running   1 (2m59s ago)   3m45s
suraj@suraj:~$ k logs -f coredns-78fcd69978-6zt8j
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[WARNING] plugin/kubernetes: starting server with unsynced Kubernetes API
.:53
[INFO] plugin/reload: Running configuration MD5 = e58989e162233e5f3ebed8edb075da2b
CoreDNS-1.8.4
linux/amd64, go1.16.4, 053c4d5
[ERROR] plugin/errors: 2 5445501714987993477.7586765173088131607. HINFO: read udp 172.17.0.2:33357->192.168.76.1:53: i/o timeout
[INFO] plugin/ready: Still waiting on: "kubernetes"
[ERROR] plugin/errors: 2 5445501714987993477.7586765173088131607. HINFO: read udp 172.17.0.2:46775->192.168.76.1:53: i/o timeout
[ERROR] plugin/errors: 2 5445501714987993477.7586765173088131607. HINFO: read udp 172.17.0.2:37238->192.168.76.1:53: i/o timeout
[ERROR] plugin/errors: 2 5445501714987993477.7586765173088131607. HINFO: read udp 172.17.0.2:43634->192.168.76.1:53: i/o timeout
[ERROR] plugin/errors: 2 5445501714987993477.7586765173088131607. HINFO: read udp 172.17.0.2:41055->192.168.76.1:53: i/o timeout
[ERROR] plugin/errors: 2 5445501714987993477.7586765173088131607. HINFO: read udp 172.17.0.2:37521->192.168.76.1:53: i/o timeout
[INFO] plugin/ready: Still waiting on: "kubernetes"
[ERROR] plugin/errors: 2 5445501714987993477.7586765173088131607. HINFO: read udp 172.17.0.2:55072->192.168.76.1:53: i/o timeout
[ERROR] plugin/errors: 2 5445501714987993477.7586765173088131607. HINFO: read udp 172.17.0.2:47769->192.168.76.1:53: i/o timeout
[ERROR] plugin/errors: 2 5445501714987993477.7586765173088131607. HINFO: read udp 172.17.0.2:48127->192.168.76.1:53: i/o timeout
[ERROR] plugin/errors: 2 5445501714987993477.7586765173088131607. HINFO: read udp 172.17.0.2:37020->192.168.76.1:53: i/o timeout

Reverted back to minikube 1.28

DNS doesn't resolve for external domains

suraj@suraj:~/sandbox$ k exec -it dnsutils -- nslookup google.com
Server:     10.96.0.10
Address:    10.96.0.10#53

** server can't find google.com.domain.name: SERVFAIL

command terminated with exit code 1

Resolves for internal domains

suraj@suraj:~/sandbox$ k exec -it dnsutils -- nslookup kube-dns
Server:     10.96.0.10
Address:    10.96.0.10#53

Name:   kube-dns.kube-system.svc.cluster.local
Address: 10.96.0.10

https://kubernetes.io/docs/tasks/debug/debug-application/debug-running-pod/

Container filesystems are visible to other containers in the pod through the /proc/$pid/root link. This makes debugging easier, but it also means that filesystem secrets are protected only by filesystem permissions.

https://kubernetes.io/docs/tasks/configure-pod-container/share-process-namespace/#understanding-process-namespace-sharing

Looks like coredns depends on the minikube nameserver to resolve the DNS for itself

suraj@suraj:~/sandbox$ kubectl debug -it coredns-565d847f94-mgmsd --image=busybox:1.28 --target=coredns
Targeting container "coredns". If you don't see processes from this container it may be because the container runtime doesn't support this feature.
Defaulting debug container name to debugger-6wsf9.
If you don't see a command prompt, try pressing enter.
/ # ps aux
PID   USER     TIME  COMMAND
    1 root      0:00 /coredns -conf /etc/coredns/Corefile
   23 root      0:00 sh
   31 root      0:00 ps aux

/ # cat /proc/1/root/etc/resolv.conf 
nameserver 192.168.85.1
search domain.name
options edns0 trust-ad ndots:0

minikube also uses the same nameserver to resolve DNS

suraj@suraj:~/sandbox$ minikube ssh -p klc4 -- cat /etc/resolv.conf
search domain.name
nameserver 192.168.85.1
options edns0 trust-ad ndots:0

and it works for external domains

suraj@suraj:~/sandbox$ minikube ssh -p klc4 -- nslookup google.com
Server:     192.168.85.1
Address:    192.168.85.1#53

Non-authoritative answer:
Name:   google.com
Address: 142.251.42.78
Name:   google.com
Address: 2404:6800:4009:831::200e

host.minikube.internal is accessible on dnsutils pod but not on coredns pod for some reason coredns:

suraj@suraj:~/sandbox$ kubectl debug -it coredns-565d847f94-mgmsd --image=busybox:1.28 --target=coredns
Targeting container "coredns". If you don't see processes from this container it may be because the container runtime doesn't support this feature.
Defaulting debug container name to debugger-v9nx8.
If you don't see a command prompt, try pressing enter.
/ # ping host.minikube.internal

ping: bad address 'host.minikube.internal'

/ # cat /proc/1/root/etc/resolv.conf 
nameserver 192.168.85.1
search domain.name
options edns0 trust-ad ndots:0
/ # cat /proc/1/root/etc/host.conf
cat: can't open '/proc/1/root/etc/host.conf': No such file or directory
/ #

dnsutils:

suraj@suraj:~/sandbox$ k exec -it dnsutils -- ping host.minikube.internal
PING host.minikube.internal (192.168.85.1) 56(84) bytes of data.
64 bytes from host.minikube.internal (192.168.85.1): icmp_seq=1 ttl=63 time=0.031 ms
64 bytes from host.minikube.internal (192.168.85.1): icmp_seq=2 ttl=63 time=0.028 ms
64 bytes from host.minikube.internal (192.168.85.1): icmp_seq=3 ttl=63 time=0.060 ms
64 bytes from host.minikube.internal (192.168.85.1): icmp_seq=4 ttl=63 time=0.046 ms
64 bytes from host.minikube.internal (192.168.85.1): icmp_seq=5 ttl=63 time=0.036 ms
^C
--- host.minikube.internal ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4086ms
rtt min/avg/max/mdev = 0.028/0.040/0.060/0.012 ms

suraj@suraj:~/sandbox$ k exec -it dnsutils -- cat /etc/host.conf
multi on
suraj@suraj:~/sandbox$ k exec -it dnsutils -- cat /etc/resolv.conf
nameserver 10.96.0.10
search kube-system.svc.cluster.local svc.cluster.local cluster.local domain.name
options ndots:5

https://github.com/kubernetes/minikube/issues/8439#issuecomment-662244769 Tried manually adding the host entry through coredns Deployment

suraj@suraj:~/sandbox$ k get deploy coredns -oyaml | grep hostAlias -A 5
      hostAliases:
      - hostnames:
        - host.minikube.internal
        ip: 192.168.85.1

Still see the same error in coredns :cry:

suraj@suraj:~/sandbox$ k logs -f coredns-65657fcbfd-9f6cv
.:53
[INFO] plugin/reload: Running configuration SHA512 = f3fde9de6486f59fe260f641c8b45d450960379ea9d73a7fef0c1feac6c746730bd77c72d2092518703e00d94c78d1eec0c6cb3efcd4dc489238241cea4bf436
CoreDNS-1.9.3
linux/amd64, go1.18.2, 45b0a11
[ERROR] plugin/errors: 2 3239380858545621521.561753342624622945. HINFO: read udp 172.17.0.4:51683->192.168.85.1:53: i/o timeout
[ERROR] plugin/errors: 2 3239380858545621521.561753342624622945. HINFO: read udp 172.17.0.4:40319->192.168.85.1:53: i/o timeout
[ERROR] plugin/errors: 2 3239380858545621521.561753342624622945. HINFO: read udp 172.17.0.4:45436->192.168.85.1:53: i/o timeout
[ERROR] plugin/errors: 2 3239380858545621521.561753342624622945. HINFO: read udp 172.17.0.4:55613->192.168.85.1:53: i/o timeout
[ERROR] plugin/errors: 2 3239380858545621521.561753342624622945. HINFO: read udp 172.17.0.4:35793->192.168.85.1:53: i/o timeout
[ERROR] plugin/errors: 2 3239380858545621521.561753342624622945. HINFO: read udp 172.17.0.4:54662->192.168.85.1:53: i/o timeout
[ERROR] plugin/errors: 2 3239380858545621521.561753342624622945. HINFO: read udp 172.17.0.4:49458->192.168.85.1:53: i/o timeout
[ERROR] plugin/errors: 2 3239380858545621521.561753342624622945. HINFO: read udp 172.17.0.4:42864->192.168.85.1:53: i/o timeout
[ERROR] plugin/errors: 2 3239380858545621521.561753342624622945. HINFO: read udp 172.17.0.4:43270->192.168.85.1:53: i/o timeout
[ERROR] plugin/errors: 2 3239380858545621521.561753342624622945. HINFO: read udp 172.17.0.4:45972->192.168.85.1:53: i/o timeout

TODO:

[x] Check /etc/host.conf for coredns pod
[x] Go through rest of the comments in https://github.com/kubernetes/minikube/issues/8439 to see if I find something useful

ping to host.minikube.internal works fine after adding hostAliases for host.minikube.internal

suraj@suraj:~/sandbox$ kubectl debug -it coredns-65657fcbfd-5fp97 --image=busybox:1.28 --target=coredns
Targeting container "coredns". If you don't see processes from this container it may be because the container runtime doesn't support this feature.
Defaulting debug container name to debugger-npqg6.
If you don't see a command prompt, try pressing enter.
/ # ps aux
PID   USER     TIME  COMMAND
    1 root      0:00 /coredns -conf /etc/coredns/Corefile
   29 root      0:00 sh
   35 root      0:00 ps aux
/ # cat proc/1/root/etc/host.conf
cat: can't open 'proc/1/root/etc/host.conf': No such file or directory
/ # cat proc/1/root/etc/resolv.conf
nameserver 192.168.85.1
search domain.name
options edns0 trust-ad ndots:0
/ # ping host.minikube.internal
PING host.minikube.internal (192.168.85.1): 56 data bytes
64 bytes from 192.168.85.1: seq=0 ttl=63 time=0.182 ms
64 bytes from 192.168.85.1: seq=1 ttl=63 time=0.118 ms
64 bytes from 192.168.85.1: seq=2 ttl=63 time=0.127 ms
64 bytes from 192.168.85.1: seq=3 ttl=63 time=0.128 ms
64 bytes from 192.168.85.1: seq=4 ttl=63 time=0.040 ms
64 bytes from 192.168.85.1: seq=5 ttl=63 time=0.126 ms
64 bytes from 192.168.85.1: seq=6 ttl=63 time=0.126 ms
64 bytes from 192.168.85.1: seq=7 ttl=63 time=0.122 ms
^C
--- host.minikube.internal ping statistics ---
8 packets transmitted, 8 packets received, 0% packet loss
round-trip min/avg/max = 0.040/0.121/0.182 ms

nslookup works as well (takes a couple of seconds)

/ # nslookup google.com
Server:    192.168.85.1
Address 1: 192.168.85.1 host.minikube.internal

Name:      google.com
Address 1: 2404:6800:4009:82c::200e bom07s35-in-x0e.1e100.net
Address 2: 142.251.42.78 bom12s21-in-f14.1e100.net

Tried apt update in dnsutils pod

suraj@suraj:~/sandbox$ k exec -it dnsutils -- apt update
Err http://deb.debian.org jessie InRelease                           

Err http://security.debian.org jessie/updates InRelease              

Err http://deb.debian.org jessie-updates InRelease                   

Err http://security.debian.org jessie/updates Release.gpg            
  Temporary failure resolving 'security.debian.org'
Err http://deb.debian.org jessie Release.gpg
  Temporary failure resolving 'deb.debian.org'
Err http://deb.debian.org jessie-updates Release.gpg
  Temporary failure resolving 'deb.debian.org'
Reading package lists... Done    
Building dependency tree       
Reading state information... Done
All packages are up to date.
W: Failed to fetch http://deb.debian.org/debian/dists/jessie/InRelease  

W: Failed to fetch http://security.debian.org/debian-security/dists/jessie/updates/InRelease  

W: Failed to fetch http://deb.debian.org/debian/dists/jessie-updates/InRelease

coredns still spits the same error:

[ERROR] plugin/errors: 2 deb.debian.org. A: read udp 172.17.0.4:53591->192.168.85.1:53: i/o timeout
[ERROR] plugin/errors: 2 deb.debian.org. A: read udp 172.17.0.4:52014->192.168.85.1:53: i/o timeout
[ERROR] plugin/errors: 2 deb.debian.org.domain.name. A: read udp 172.17.0.4:53670->192.168.85.1:53: i/o timeout
[ERROR] plugin/errors: 2 deb.debian.org.domain.name. A: read udp 172.17.0.4:56138->192.168.85.1:53: i/o timeout
[ERROR] plugin/errors: 2 deb.debian.org. A: read udp 172.17.0.4:55707->192.168.85.1:53: i/o timeout
[ERROR] plugin/errors: 2 deb.debian.org. A: read udp 172.17.0.4:36412->192.168.85.1:53: i/o timeout

Updated the nameserver in the Corefile of coredns to point to 8.8.8.8 (google nameserver) instead of /etc/resolv.conf (which points to minikube nameserver)

...
        forward . /etc/resolv.conf {
           max_concurrent 1000
        }
...

to

...
        forward . 8.8.8.8 {
           max_concurrent 1000
        }
...

coredns stopped throwing i/o timeout errors:

suraj@suraj:~/sandbox$ k logs -f coredns-65657fcbfd-9sxns
.:53
[INFO] plugin/reload: Running configuration SHA512 = 1f08872738c1bb792798228c3132d00ce1519fc2f64598f7e92fd6cd85cb4d68ec93c82f0ab05eba4b0e9b38e2ccafd658f167746fbc3bcf6baf3bee4c00c46f
CoreDNS-1.9.3
linux/amd64, go1.18.2, 45b0a11

But when I try apt update from dnsutils pod:

suraj@suraj:~/sandbox$ k exec -it dnsutils -- apt update
Get:1 http://security.debian.org jessie/updates InRelease                     
100% [1 InRelease gpgv 1524 B]Splitting up /var/lib/apt/lists/partial/security.debian.org_debian-security_dists_jessie_updates_InRelease into data and signatuErr http://security.debian.org jessie/updates InRelease

Fetched 1524 B in 2s (755 B/s)
Reading package lists... Done
Building dependency tree       
Reading state information... Done
All packages are up to date.
W: An error occurred during the signature verification. The repository is not updated and the previous index files will be used. GPG error: http://security.debian.org jessie/updates InRelease: Clearsigned file isn't valid, got 'NODATA' (does the network require authentication?)

W: Failed to fetch http://security.debian.org/debian-security/dists/jessie/updates/InRelease  

W: Some index files failed to download. They have been ignored, or old ones used instead.

vadasambar commented 1 year ago

It's not limited to minikube. kind cluster's coredns has the same timeout issue

uraj@suraj:~/sandbox$ k get po 
NAME                                         READY   STATUS    RESTARTS   AGE
coredns-558bd4d5db-5zkss                     1/1     Running   0          50s
coredns-558bd4d5db-g5pf9                     1/1     Running   0          50s
etcd-kind-control-plane                      1/1     Running   0          53s
kindnet-zbgjb                                1/1     Running   0          51s
kube-apiserver-kind-control-plane            1/1     Running   0          53s
kube-controller-manager-kind-control-plane   1/1     Running   0          53s
kube-proxy-ccvj6                             1/1     Running   0          51s
kube-scheduler-kind-control-plane            1/1     Running   0          53s

suraj@suraj:~/sandbox$ k logs -f coredns-558bd4d5db-5zkss
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.8.0
linux/amd64, go1.15.3, 054c9ae
[ERROR] plugin/errors: 2 529578249049278684.133802908642326027. HINFO: read udp 10.244.0.2:47117->172.21.0.1:53: i/o timeout
[ERROR] plugin/errors: 2 529578249049278684.133802908642326027. HINFO: read udp 10.244.0.2:43731->172.21.0.1:53: i/o timeout
[ERROR] plugin/errors: 2 529578249049278684.133802908642326027. HINFO: read udp 10.244.0.2:43152->172.21.0.1:53: i/o timeout
[ERROR] plugin/errors: 2 529578249049278684.133802908642326027. HINFO: read udp 10.244.0.2:36089->172.21.0.1:53: i/o timeout

vadasambar commented 1 year ago

Re-installing docker didn't help either. Trying to re-install my Xubuntu.

vadasambar commented 1 year ago

Re-installed Xubuntu

vadasambar commented 1 year ago

95.216.36.80 points to a non-google website

vadasambar commented 1 year ago

When I run dnsutils docker image from my host machine,

I get 142.250.183.14 which also seems to point to a non google website

Looks like the reverse IP search is not reliable.

In-pod nslookup shows google.com.domain.name in the output (left) while running the dnsutils docker image on the host machine does show google.com in the output

Makes me think something is wrong with in-pod DNS lookup

vadasambar commented 1 year ago

/etc/resolv.conf of dnslookup pod If I set it to google's DNS server It returns results similar to the host.

apt-install works

Looks like there's some problem with the DNS lookup after all.

vadasambar commented 1 year ago

Read https://coredns.io/2017/06/08/how-queries-are-processed-in-coredns/ again

vadasambar commented 1 year ago

DNS resolution from inside minikube ssh works fine but doesn't from the pod. Adding

forward . 8.8.8.8 8.8.4.4 {
  max_concurrency 3000
}

in CoreDNS config doesn't help.

vadasambar commented 1 year ago

Started a kind cluster to see if I can reproduce the issue there.

I can. It points to 78.47.226.171

root@dnsutils:/# cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local domain.name
nameserver 10.96.0.10
options ndots:5
root@dnsutils:/#

I can't do apt update or apt install from the dnsutils pod but I can ping -c 3 google.com

I can't curl google.com though

suraj@suraj:~$ k run -it curl --image=curlimages/curl -- sh 
If you don't see a command prompt, try pressing enter.
/ $ curl google.com
curl: (6) Could not resolve host: google.com
/ $ curl www.google.com -v
* Could not resolve host: www.google.com
* Closing connection 0
curl: (6) Could not resolve host: www.google.com
/ $

DNS lookup seems to work in the curl pod

/ $ nslookup google.com
Server:     10.96.0.10
Address:    10.96.0.10:53

Non-authoritative answer:
Name:   google.com
Address: 172.217.167.174

Non-authoritative answer:
Name:   google.com
Address: 2404:6800:4009:820::200e

/ $

vadasambar commented 1 year ago

`nslookup` works fine in minikube's curl pod too

suraj@suraj:~$ k exec -it curl -- sh  
/ $ nslookup google.com
Server:     10.96.0.10
Address:    10.96.0.10:53

Non-authoritative answer:
Name:   google.com
Address: 142.250.183.14

Non-authoritative answer:
Name:   google.com
Address: 2404:6800:4009:820::200e

/ $

It just errors when I do curl google.com

/ $ curl google.com
curl: (6) Could not resolve host: google.com
/ $

vadasambar commented 1 year ago

Looks like it's not just curl google.com which fails. Resolving K8s Services inside Kubernetes fails as well. Interesting

/ $ curl google.com
curl: (6) Could not resolve host: google.com
/ $ curl kube-dns
curl: (28) Failed to connect to kube-dns port 80 after 129819 ms: Couldn't connect to server
/ $ curl kubernetes.default
curl: (28) Failed to connect to kubernetes.default port 80 after 129202 ms: Couldn't connect to server
/ $

kube-dns Service's IP hasn't changed. curl pod's /etc/resolv.conf points to the right kube-dns Service IP

Looks like nslookup doesn't resolve short FQDNs. I have to specify them in the full

/ $ nslookup kubernetes.default
Server:     10.96.0.10
Address:    10.96.0.10:53

** server can't find kubernetes.default: NXDOMAIN

** server can't find kubernetes.default: NXDOMAIN

/ $ nslookup kube-dns
Server:     10.96.0.10
Address:    10.96.0.10:53

Name:   kube-dns.kube-system.svc.cluster.local
Address: 10.96.0.10

** server can't find kube-dns.svc.cluster.local: NXDOMAIN

** server can't find kube-dns.svc.cluster.local: NXDOMAIN

** server can't find kube-dns.cluster.local: NXDOMAIN

** server can't find kube-dns.cluster.local: NXDOMAIN

** server can't find kube-dns.domain.name: NXDOMAIN

** server can't find kube-dns.domain.name: NXDOMAIN

/ $ nslookup kube-dns.kube-system.svc.cluster.local
Server:     10.96.0.10
Address:    10.96.0.10:53

Name:   kube-dns.kube-system.svc.cluster.local
Address: 10.96.0.10

vadasambar commented 1 year ago

I can't curl kube-dns service

/ $ curl kube-dns.kube-system.svc.cluster.local:53 --verbose
* Could not resolve host: kube-dns.kube-system.svc.cluster.local
* Closing connection 0
curl: (6) Could not resolve host: kube-dns.kube-system.svc.cluster.local
/ $ nslookup kube-dns.kube-system.svc.cluster.local
Server:     10.96.0.10
Address:    10.96.0.10:53

Name:   kube-dns.kube-system.svc.cluster.local
Address: 10.96.0.10

vadasambar commented 1 year ago

I had domain.name in my /etc/resolv.conf on my host machine. Seems the OS tacks that on to the docker containers it creates and minikube (i.e., the container it runs) in turn tacks that on to the docker in docker container it creates (i.e., pods)

Removing domain.name from the /etc/resolv.conf, deleting minikube cluster and re-creating it fixed the issue.

eldarj commented 1 year ago

@vadasambar I'm having the same exact issue that you've been describing here. I'm running Ubuntu 22.04.1 LTS (5.19.0-35-generic)

I don't have anything in /etc/resolv.conf except for the following

me@me:~$ cat /etc/resolv.conf
nameserver 8.8.8.8 
nameserver 8.8.4.4

Within minikube I got

docker@minikube:~$ cat /etc/resolv.conf
nameserver 192.168.49.1
options ndots:0

DNS doesn't work, neither I have internet connectivity from within pods, and internal DNS doesn't work either - can't reach any k8s internals from within pods.

Do you have any suggestions please?

Minikube version

me@me:~$ minikube version
minikube version: v1.29.0

vadasambar commented 1 year ago

@eldarj does this happen in a fresh new minikube cluster as well? I would check the /etc/resolv.conf of any random pod in the cluster and see if the nameserver is pointing to IP of the kube-dns Kubernetes Service. If it is not, the kube-dns Service might have been re-created. If it is pointing correctly, it means we have another problem.

eldarj commented 1 year ago

@vadasambar thanks a lot for the reply.

Regarding the /etc/resolv.conf within pods - I already had nameserver configured with the correct IP pointing to kube-dns

root@sample-db8b79bf4-t5jv9:/app# cat /etc/resolv.conf 
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

me@me:~$ k get svc -n kube-system
NAME       TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
kube-dns   ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   25h

But, I fixed this by enabling IPv4 forwarding on my host itself with:

sysctl -w net.ipv4.ip_forward=1

either that, or uncommenting net.ipv4.ip_forward in /etc/sysctl.conf (or setting it to 1)

Everything worked out after this.

However, I am not entirely sure why ipv4 forwarding on the host is affecting minikube's internal communication

vadasambar commented 1 year ago

@eldarj glad to know the problem is fixed.

But, I fixed this by enabling IPv4 forwarding on my host itself with:

Interesting. I think you might be using docker's networking (instead of a Kubernetes CNI plugin) which needs IP forward enabled. refs:

docker runtime does not need a cni and comes with cni by default

https://github.com/kubernetes/minikube/issues/8445#issuecomment-642322262

The easy fix would be for Docker daemon to run a sysctl -w net.ipv4.ip_forward=1 at startup

https://github.com/moby/moby/issues/490#issuecomment-17301124

A quick check shows I have it enabled too:

eldarj commented 1 year ago

@vadasambar yes indeed, I missed that small detail, and I am running minikube with the docker driver. Thanks!

vadafoss / daily-updates