Open sven-borkert opened 1 year ago
It looks like the coredns pods are failing to start because they are unable to connect to the Kubernetes API server. This could be due to a network issue, or an issue with the configuration of the coredns pods.
One possible solution is to check the logs of the coredns pods to see if there is more detailed information about the error. You can do this by running the following command:
kubectl logs -n kube-system coredns-<POD-ID>
Replace
Additionally, you can try restarting the coredns pods to see if that fixes the issue. You can do this by running the following command:
kubectl delete pod -n kube-system coredns-<POD-ID>
Again, replace
Hi,
yes, the coredns pods are starting, but not going "ready" because they cannot reach the cluster ip of the Kubernetes API server. I deleted one of the pods and checked the logs of the newly created pod:
$ kubectl logs coredns-7c9cfc6995-snvgp -n kube-system -f
plugin/kubernetes: Get "https://10.32.0.1:443/version?timeout=32s": dial tcp 10.32.0.1:443: i/o timeout
My networking seems to be broken, and I don't see the 10.32.0.1 in the iptables rules on the worker nodes. I think that kube-proxy should create a rule that catches connections to that virtual IP and forwards them to the controller nodes running the kube-apiservers, right? I checked the logs of kube-proxy on the nodes and I don't see any errors.
In the iptables rules of the workers I see it has created a rule for kube-dns, but this has no targets yet as it rejected as the coredns pods don't start correctly:
-A KUBE-SERVICES -d 10.32.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns has no endpoints" -m udp --dport 53 -j REJECT --reject-with icmp-port-unreachable
-A KUBE-SERVICES -d 10.32.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp has no endpoints" -m tcp --dport 53 -j REJECT --reject-with icmp-port-unreachable
-A KUBE-SERVICES -d 10.32.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:metrics has no endpoints" -m tcp --dport 9153 -j REJECT --reject-with icmp-port-unreachable
From my understanding, I would have expected a rule here that filters on destination 10.32.0.1:443, right? But for some reason there is no rule like that, so the service it not reachable.
Besides that, my container networking seems to be fully broken. I have checked the tutorial multiple times but did not find the error yet. I have installed the cni-plugins-linux to /opt/cni/bin/ and created /etc/cni/net.d/10-bridge.conf and 99-loopback.conf:
root@worker-0:/etc/cni/net.d# ls -l
total 8
-rw-r--r-- 1 root root 303 Dez 1 19:31 10-bridge.conf
-rw-r--r-- 1 root root 72 Dez 1 19:32 99-loopback.conf
root@worker-0:/etc/cni/net.d# cat *
{
"cniVersion": "0.4.0",
"name": "bridge",
"type": "bridge",
"bridge": "cnio0",
"isGateway": true,
"ipMasq": true,
"ipam": {
"type": "host-local",
"ranges": [
[{"subnet": "10.200.0.0/24"}]
],
"routes": [{"dst": "0.0.0.0/0"}]
}
}
{
"cniVersion": "0.4.0",
"name": "lo",
"type": "loopback"
}
I verified they each have their own subnet 10.200.0.0/24, 10.200.1.0/24, 10.200.2.0/24.
I can see the expected bridge interface cnio0 on the worker nodes:
root@worker-0:~# ifconfig
cnio0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.200.0.1 netmask 255.255.255.0 broadcast 10.200.0.255
inet6 fe80::f0aa:c0ff:fe70:2040 prefixlen 64 scopeid 0x20<link>
ether 4a:55:d3:bc:d7:b6 txqueuelen 1000 (Ethernet)
RX packets 230 bytes 13102 (13.1 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 30 bytes 1588 (1.5 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.0.220 netmask 255.255.255.0 broadcast 192.168.0.255
inet6 fe80::250:56ff:fe3b:abcb prefixlen 64 scopeid 0x20<link>
ether 00:50:56:3b:ab:cb txqueuelen 1000 (Ethernet)
RX packets 8906 bytes 2294242 (2.2 MB)
RX errors 0 dropped 1840 overruns 0 frame 0
TX packets 4248 bytes 616743 (616.7 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
ens34: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.240.0.20 netmask 255.255.255.0 broadcast 10.240.0.255
inet6 fe80::250:56ff:fe38:d826 prefixlen 64 scopeid 0x20<link>
ether 00:50:56:38:d8:26 txqueuelen 1000 (Ethernet)
RX packets 725 bytes 91678 (91.6 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 1005 bytes 84992 (84.9 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 2930 bytes 196926 (196.9 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 2930 bytes 196926 (196.9 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
veth74645b5c: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet6 fe80::7c22:c3ff:fe57:87ee prefixlen 64 scopeid 0x20<link>
ether 6a:70:f9:6f:be:48 txqueuelen 0 (Ethernet)
RX packets 19 bytes 1424 (1.4 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 40 bytes 3076 (3.0 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
veth857e8004: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet6 fe80::7cfb:c3ff:fe59:9c08 prefixlen 64 scopeid 0x20<link>
ether 12:30:8f:a8:f2:3c txqueuelen 0 (Ethernet)
RX packets 120 bytes 8408 (8.4 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 48 bytes 3180 (3.1 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
The routes on the worker look like this:
root@worker-0:~# ip route
default via 192.168.0.1 dev ens33 proto dhcp src 192.168.0.220 metric 100
10.200.0.0/24 via 10.240.0.20 dev ens34 proto static
10.200.0.0/24 dev cnio0 proto kernel scope link src 10.200.0.1
10.200.1.0/24 via 10.240.0.21 dev ens34 proto static
10.200.2.0/24 via 10.240.0.22 dev ens34 proto static
10.240.0.0/24 dev ens34 proto kernel scope link src 10.240.0.20
192.168.0.0/24 dev ens33 proto kernel scope link src 192.168.0.220 metric 100
192.168.0.1 dev ens33 proto dhcp scope link src 192.168.0.220 metric 100
I started a "busybox" pod on worker-0 to check the network connection. It has it's interface and an IP from the correct subnet. But it's not even able to ping the gateway, and nothing else:
$ kubectl exec -ti busybox -- /bin/sh
/ # ifconfig
eth0 Link encap:Ethernet HWaddr BA:96:88:8C:34:EE
inet addr:10.200.0.55 Bcast:10.200.0.255 Mask:255.255.255.0
inet6 addr: fe80::b896:88ff:fe8c:34ee/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:41 errors:0 dropped:0 overruns:0 frame:0
TX packets:22 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:3118 (3.0 KiB) TX bytes:1662 (1.6 KiB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
/ # ip route
default via 10.200.0.1 dev eth0
10.200.0.0/24 dev eth0 scope link src 10.200.0.55
/ # ping 10.200.0.1
PING 10.200.0.1 (10.200.0.1): 56 data bytes
Thank you for any hints that might me help to understand this. Regards, Sven
Aaaaaah! Sometimes it's good to write the details to someone else. My routes are wrong. I fixed the routing and the pods went healthy. Not sure if everything works now, but I'm one step further. :)
perfect you are rocking.
On Fri, Dec 16, 2022 at 6:56 PM Sven Borkert @.***> wrote:
Aaaaaah! Sometimes it's good to write the details to someone else. My routes are wrong. I fixed the routing and the pods went healthy. Not sure if everything works now, but I'm one step further. :)
— Reply to this email directly, view it on GitHub https://github.com/kelseyhightower/kubernetes-the-hard-way/issues/725#issuecomment-1354763212, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALCT4Z5O3IU5YBEZ72VWLWTWNRUY5ANCNFSM6AAAAAAS7BDOXQ . You are receiving this because you are subscribed to this thread.Message ID: @.*** com>
All the tests from the tutorial are working and the containers can reach each other, nice.
I did this installation on Ubuntu 22.04. It seems to be a good idea (easier) to use the containerd and runc that come with this Ubuntu version, the manually installed version from this tutorial seem to be unhappy with cgroup v2. (I know this tutorial is meant for an older Ubuntu version)
The coredns does resolve the name "kubernetes", and after I added a forward to it's configuration is also resolves external IPs. But it does not seem to resolve any pod names. Shouldn't it?
In other tutorials I always read I would need a CNI provider like "Calico" for the networking between the pods. The package cni-plugins-linux is not Calico as far as I understand, so this tutorial does not install Calico. What do I need it for then?
Regards, Sven
Ran into this in the current version of the tutorial as of today. I was able to resolve it by following instructions to modify the coredns config to point at 1.8, but also kubectl apply -f deployments/kube-dns.yaml
. not sure if this puts things in an optimal state, but i wasn't able to resolve with only one applied, and with both it works fine.
Hi,
I followed the tutorial until "Deploying the DNS Cluster Add-on", but the coredns pods are not going to ready status and keep restarting. The logs say the the pod cannot connect to the kubernetes service IP:
plugin/kubernetes: Get "https://10.32.0.1:443/version?timeout=32s": dial tcp 10.32.0.1:443: i/o timeout
The service 'kubernetes' looks like it should provide this cluster IP:
But kube-proxy on the worker nodes does not seem to have created this cluster IP:
It at least knows the 10.32.0.10 of the not yet running kube-dns service, but it does not create any rules for the 10.32.0.1.
I don't see any issues in the logs of kube-proxy:
Any idea what's wrong here?
Regards, Sven