coredns failure in ipv6 kind cluster #1736

Closed sayboras closed 4 years ago

sayboras commented 4 years ago

What happened: Unable to perform DNS lookup in ipv6 cluster.

PS: This might be a limitation of container runtime for ipv6, so it's kind of question, but I liked the bug kind's bug report template, so that I can provide all related informations.

What you expected to happen: DNS lookup should be working

How to reproduce it (as minimally and precisely as possible):

Kind configuration ```yaml kind: Cluster apiVersion: kind.x-k8s.io/v1alpha4 nodes: - role: control-plane - role: worker networking: ipFamily: ipv6 podSubnet: "fd00:10:244::/64" serviceSubnet: "fd00:10:96::/112" ```
DNS util pod ```yaml apiVersion: v1 kind: Pod metadata: name: dnsutils namespace: default spec: containers: - name: dnsutils image: gcr.io/kubernetes-e2e-test-images/dnsutils:1.3 command: - sleep - "3600" imagePullPolicy: IfNotPresent restartPolicy: Always ```

Anything else we need to know?:

Please find below coredns logs

$ ksyslo coredns-66bff467f8-nnch5 --timestamps      
2020-07-19T06:05:57.061277847Z .:53
2020-07-19T06:05:57.061385231Z [INFO] plugin/reload: Running configuration MD5 = 4e235fcc3696966e76816bcd9034ebc7
2020-07-19T06:05:57.061393166Z CoreDNS-1.6.7
2020-07-19T06:05:57.061398375Z linux/amd64, go1.13.6, da7f65b
2020-07-19T06:05:58.061904725Z [ERROR] plugin/errors: 2 2244698976610727589.491498813804393448. HINFO: dial udp connect: network is unreachable --> this log entry is appearing right after cluster creation
2020-07-19T06:07:04.701610849Z [ERROR] plugin/errors: 2 www.google.com.lan. A: dial udp connect: network is unreachable
2020-07-19T06:07:37.996168762Z [ERROR] plugin/errors: 2 www.google.com.lan. A: dial udp connect: network is unreachable
2020-07-19T06:14:51.500618905Z [ERROR] plugin/errors: 2 www.google.com.lan. A: dial udp connect: network is unreachable


Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.5", GitCommit:"e6503f8d8f769ace2f338794c914a96fc335df0f", GitTreeState:"clean", BuildDate:"2020-06-26T03:47:41Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.4", GitCommit:"c96aede7b5205121079932896c4ad89bb93260af", GitTreeState:"clean", BuildDate:"2020-06-20T01:49:49Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
$ docker info       
 Debug Mode: false

 Containers: 5
  Running: 2
  Paused: 0
  Stopped: 3
 Images: 38
 Server Version: 19.03.12
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
 init version: fec3683
 Security Options:
   Profile: default
 Kernel Version: 5.4.0-40-generic
 Operating System: Linux Mint 20
 OSType: linux
 Architecture: x86_64
 CPUs: 16
 Total Memory: 62.82GiB
 Name: linuxmint
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Username: sayboras
 Registry: https://index.docker.io/v1/
 Experimental: false
 Insecure Registries:
 Live Restore Enabled: false

WARNING: No swap limit support

``` $ cat /etc/os-release NAME="Linux Mint" VERSION="20 (Ulyana)" ID=linuxmint ID_LIKE=ubuntu PRETTY_NAME="Linux Mint 20" VERSION_ID="20" HOME_URL="https://www.linuxmint.com/" SUPPORT_URL="https://forums.ubuntu.com/" BUG_REPORT_URL="http://linuxmint-troubleshooting-guide.readthedocs.io/en/latest/" PRIVACY_POLICY_URL="https://www.linuxmint.com/" VERSION_CODENAME=ulyana UBUNTU_CODENAME=focal ```
BenTheElder commented 4 years ago

Is that the full kind config? Because the linked issue suggests that this involves using a nonstandard CNI in kind (calico).

sayboras commented 4 years ago

Is that the full kind config? Because the linked issue suggests that this involves using a nonstandard CNI in kind (calico).

@BenTheElder It's the full configuration, I try my best to provide minimal configuration and avoid any additional dependecies. Let me know if you cannot replicate the issue.

The linked issue is mainly for my reference.

aojea commented 4 years ago

It seems CodeDNS has as upstream dns server, since CoreDNS is an IPv6 only pod it can´t reach it and fail.

If we dump the CoreDNS config we can see that it uses resolv.conf to obtain the upstream DNS servers

$ kubectl -n kube-system get configmap coredns -o yaml
apiVersion: v1
  Corefile: |
    .:53 {
        health {
           lameduck 5s
        kubernetes cluster.local lan in-addr.arpa ip6.arpa {
           pods insecure
           ttl 30
        prometheus :9153
        forward . /etc/resolv.conf {
           max_concurrent 1000
        cache 30

you should replace the forward line, editing the configmap, and using an IPv6 DNS server that CoreDNS can reach (2003::1 is an example)

 forward .  [2003::1]:53
sayboras commented 4 years ago

@aojea appreciated for your time discussing on slack :tada:

I have continued checking this issue following your suggestion. I have performed the below steps and get it working.

Now DNS lookup is working

$ kex dnsutils -- nslookup www.google.com
Server:     fd00:10:96::a
Address:    fd00:10:96::a#53

Non-authoritative answer:
Name:   www.google.com
Name:   www.google.com
Address: 2404:6800:4006:806::2004

Not sure if you are planning to make any changes in kind as such, otherwise, feel free to close this issue. Thanks again for your kind help @aojea @BenTheElder :tada:. Feel free to let me know if anything is required.

sayboras commented 4 years ago

Just a quick note, outgoing traffic from pod is also failed now (e.g. curl www.google.com). Not sure if it's my ISP issue (no ipv6 support), or there is something else that I have missed.

aojea commented 4 years ago

Just a quick note, outgoing traffic from pod is also failed now (e.g. curl www.google.com). Not sure if it's my ISP issue (no ipv6 support), or there is something else that I have missed.

no ISP with ipv6 support no fun :) you can have a free ipv6 tunnel with hurricane electric if you want to use IPv6 Internet, there are plenty of tutorials , I can confirm that works well. Just an advice, managing dual stack environments is a nightmare, so start enabling the tunnel only in a few machines until you are comfortable to move it to the whole network ;)

aojea commented 4 years ago

I think we can close it Thanks /close

k8s-ci-robot commented 4 years ago

@aojea: Closing this issue.

sayboras commented 4 years ago

@aojea thanks for your time and kind help :+1: