Closed yinwenqin closed 6 years ago
/area dns
/kind bug
/sig network
I guess maybe the selector of kube-dns svc does not match coredns pod.
You can delete the kube-dns svc and then use this script to create the coredns.yaml
file (note the changes to the script parameters) , Finally execute the kubectl create -f coredns.yaml
command to use coredns (this was tested on my 1.10.1 ubuntu16 cluster)
I believe the endpoints would not list IPs if the selector did not match the pod labels. Per my understanding, the fact that the endpoints exist and have ready IPs means the selectors are selecting something.
FYI, The official CoreDNS deployment script is here.
@yinwenqin, can you make a DNS query from a pod? For example by spinning up a client pod e.g. kubectl run -it --rm --image=infoblox/dnstools dns-client
, and then then executing a dig query from the pod such as dig kubernetes.default.svc.cluster.local.
@johnbelamaric
@yinwenqin, still an issue?
@chrisohaver After I redeployed CoreDNS twice, the problem was solved, which was very strange, but so far the problem has not recurred.I will close this issue,thank you so much!
Same issue,
I also had to redeploy the coredns
deployment:
$ kubectl get deploy -n kube-system
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
coredns 2 2 2 2 29h
$ wget https://raw.githubusercontent.com/zlabjp/kubernetes-scripts/master/force-update-deployment
$ chmod +x force-update-deployment
$ force-update-deployment coredns -n kube-system
Thanks! This problem was driving me nuts! I had to redeploy coredns three times before it started working. Also had to upgrade kubeadm on my master node. Insanity!
Am wondering though if this needs to be done on all nodes. I did it on the master node with success, but am still having issues on the other nodes.
I rebuilt my cluster, and now this works on 2-nodes (master and a worker), but cannot get it to work on the other 2-nodes. Plus, I have to force a redeploy with alarming frequency, otherwise I lose the network. This is nuts.
And now it died... no matter how many redeploys I issue, coredns is not working with kubernetes. I'll try rebuilding it again later with Calico as the CNI. Am using Flannel now.
Did you take a look at the link above - https://kubernetes.io/docs/tasks/debug-application-cluster/debug-service/
you can probably jump to the section:
and proceed from there
Thanks @johnbelamaric, before trying this, I tried @mritd's solution. So far, it appears to be working across all nodes. I'll do some heavy testing later to verify this, but so far, so good.
Ok, good. If you are using the add on manager, and have kube-dns instead of coredns enabled, then it will revert the service and deployment resources, which may result in something like this (depending on the labels on your coredns deployment).
Same issue, I also had to redeploy the
coredns
deployment:$ kubectl get deploy -n kube-system NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE coredns 2 2 2 2 29h
$ wget https://raw.githubusercontent.com/zlabjp/kubernetes-scripts/master/force-update-deployment $ chmod +x force-update-deployment
$ force-update-deployment coredns -n kube-system
worked.But why deploy twice solved? I checked route items using ip route
, found nothing different.
It is certainly strange. I would want to see:
1) your service, deployment and configmap resources, along with some event history (kubectl describe or kubectl get events) - before and after the issue 2) logs from the CoreDNS containers both working and not working
same issue first, svc could telnet
[root@k8s-m1 ~]# telnet 10.96.0.10 53
Trying 10.96.0.10...
Connected to 10.96.0.10.
Escape character is '^]'.
^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^CConnection closed by foreign host.
now, test the dns pod ip
[root@k8s-m1 ~]# kubectl -n kube-system get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-5dc6f95498-m6c5n 1/1 Running 0 60s 10.244.1.5 172.16.2.10 <none> <none>
metrics-server-7c7c88f4d4-8l9dd 1/1 Running 0 7h5m 10.244.4.5 172.16.2.4 <none> <none>
[root@k8s-m1 ~]# dig @10.244.1.5 baidu.com +short
; <<>> DiG 9.9.4-RedHat-9.9.4-73.el7_6 <<>> @10.244.1.5 baidu.com +short
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached
[root@k8s-m1 ~]# telnet 10.244.1.5 53
Trying 10.244.1.5...
Connected to 10.244.1.5.
Escape character is '^]'.
^C^C^C^C^C^CConnection closed by foreign host.
[root@k8s-m1 ~]# ^C
[root@k8s-m1 ~]# curl 10.244.1.5:8181/ready
OK
now , ssh the node and test
[root@k8s-m1 ~]# ssh 172.16.2.10
[root@k8s-n1 ~]# dig @10.244.1.5 baidu.com
39.156.69.79
220.181.38.148
Telnet and curl use TCP. dig is using UDP. Try +tcp option with dig. That will tell you if it's some UDP transport issue in your network.
I rebuild my work environment,I will try the method you said when I encounter it later.
@ngocson2vn Worked. This solution save my day.
I clear the all firewall, and it works just fine now.
iptables -P INPUT ACCEPT
iptables -P FORWARD ACCEPT
iptables -P OUTPUT ACCEPT
iptables -t nat -F
iptables -t mangle -F
iptables -F
iptables -X
Traffic can be routed to the pods via a Kubernetes service, or it can be routed directly to the pods. When traffic is routed to the pods via a Kubernetes service, Kubernetes uses a built-in mechanism called kube-proxy to load balance traffic between the pods.
Having an outdated kube-proxy image version can cause the routing issues. In my case it was 1.23 running on k8s cluster 1.27. Upgrading the image fixed the issue for me
here is the status right now:
It just look like every thing is fine.Whether it is viewing the svc/pod/endpoint status, logs,iptables,network,everything is normal. But only the one svc--kube-dns don't work,it causes to be unable to parse the domain name in pod,but ping IP is ok. So worry about this for a long time,Somebody can help me?