Closed Abhishekghosh1998 closed 8 months ago
@afbjorklund Any help? I am a bit stuck...
I have the same problem. It does not seem to be a DNS issue but a issue with the network routing. Seems access to IP addresses outside of the pod network is not working. I am guessing that the hostname look up failures is because the pod cannot access the DNS server specified in the /etc/resolv.conf. For me the pod network is 10.244.0.0/16 and the DNS is specified as 10.96.0.10 which is a different subnet.
From the pod
# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: eth0@if28: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue
link/ether 56:6c:82:b2:d1:1a brd ff:ff:ff:ff:ff:ff
inet 10.244.0.32/16 brd 10.244.255.255 scope global eth0
valid_lft forever preferred_lft forever
# cat /etc/resolv.conf
nameserver 10.96.0.10
# kubectl get service kube-dns -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 4h33m
Given you are using Ubuntu kubeadm detects/tries to configure systemd-resolved
using driver=none
which you can see in the setting of kubelet.resolv-conf=/run/systemd/resolve/resolv.conf
Depending on your local setups you may not be using systemd-resolved. On the host machine running minikube can you run and provide the output of the following so I can better understand your host machine DNS configuration
systemctl status --no-pager systemd-resolved
ls -l /etc/resolv.conf
ls -l /run/systemd/resolve/
cat /run/systemd/resolve/resolv.conf
@megazone23 - the different IPs for the service cube-dns is misleading due to the way services operate using a virtual IP.
@Abhishekghosh1998 and @megazone23 there are some general debugging tips on Kubernetes DNS using a dnsutils pod https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/ that might help you in your debugging. I'll try setup an ubuntu box to test locally
@pnasrat Here are the outputs which you asked for:
$ systemctl status --no-pager systemd-resolved -l
โ systemd-resolved.service - Network Name Resolution
Loaded: loaded (/lib/systemd/system/systemd-resolved.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2023-10-18 23:16:23 IST; 4 days ago
Docs: man:systemd-resolved.service(8)
man:org.freedesktop.resolve1(5)
https://www.freedesktop.org/wiki/Software/systemd/writing-network-configuration-managers
https://www.freedesktop.org/wiki/Software/systemd/writing-resolver-clients
Main PID: 863 (systemd-resolve)
Status: "Processing requests..."
Tasks: 1 (limit: 76849)
Memory: 9.7M
CPU: 53.929s
CGroup: /system.slice/systemd-resolved.service
โโ863 /lib/systemd/systemd-resolved
Oct 18 23:16:23 abhishek systemd[1]: Starting Network Name Resolution...
Oct 18 23:16:23 abhishek systemd-resolved[863]: Positive Trust Anchors:
Oct 18 23:16:23 abhishek systemd-resolved[863]: . IN DS 20326 8 2 e06d44b80b8f1d39a95c0b0d7c65d08458e880409bbc683457104237c7f8ec8d
Oct 18 23:16:23 abhishek systemd-resolved[863]: Negative trust anchors: home.arpa 10.in-addr.arpa 16.172.in-addr.arpa 17.172.in-addr.arpa 18.172.in-addr.arpa 19.172.in-addr.arpa 20.172.in-addr.arpa 21.172.in-addr.arpa 22.172.in-addr.arpa 23.172.in-addr.arpa 24.172.in-addr.arpa 25.172.in-addr.arpa 26.172.in-addr.arpa 27.172.in-addr.arpa 28.172.in-addr.arpa 29.172.in-addr.arpa 30.172.in-addr.arpa 31.172.in-addr.arpa 168.192.in-addr.arpa d.f.ip6.arpa corp home internal intranet lan local private test
Oct 18 23:16:23 abhishek systemd-resolved[863]: Using system hostname 'abhishek'.
Oct 18 23:16:23 abhishek systemd[1]: Started Network Name Resolution.
Oct 18 23:16:27 abhishek systemd-resolved[863]: enp0s31f6: Bus client set default route setting: yes
Oct 18 23:16:27 abhishek systemd-resolved[863]: enp0s31f6: Bus client set DNS server list to: 10.16.25.15, 10.15.25.13
Oct 18 23:16:36 abhishek systemd-resolved[863]: enp0s31f6: Bus client set DNS server list to: 10.16.25.15, 10.15.25.13, fe80::1
Oct 21 17:40:46 abhishek systemd-resolved[863]: Using degraded feature set UDP instead of UDP+EDNS0 for DNS server 10.15.25.13.
$ ls -l /etc/resolv.conf
lrwxrwxrwx 1 root root 39 Dec 21 2022 /etc/resolv.conf -> ../run/systemd/resolve/stub-resolv.conf
$ ls -l /run/systemd/resolve/
total 8
srw-rw-rw- 1 systemd-resolve systemd-resolve 0 Oct 18 23:16 io.systemd.Resolve
drwx------ 2 systemd-resolve systemd-resolve 60 Oct 18 23:16 netif
-rw-r--r-- 1 systemd-resolve systemd-resolve 830 Oct 18 23:16 resolv.conf
-rw-r--r-- 1 systemd-resolve systemd-resolve 920 Oct 18 23:16 stub-resolv.conf
$ cat /run/systemd/resolve/resolv.conf
# This is /run/systemd/resolve/resolv.conf managed by man:systemd-resolved(8).
# Do not edit.
#
# This file might be symlinked as /etc/resolv.conf. If you're looking at
# /etc/resolv.conf and seeing this text, you have followed the symlink.
#
# This is a dynamic resolv.conf file for connecting local clients directly to
# all known uplink DNS servers. This file lists all configured search domains.
#
# Third party programs should typically not access this file directly, but only
# through the symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a
# different way, replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.
nameserver 10.16.25.15
nameserver 10.15.25.13
nameserver fe80::1%2
search .
Depending on your local setups you may not be using systemd-resolved.
I might be wrong, but my system seems to be using systemd-resolved
, isn't it? Please can you have a look once.
Yes you are using systemd-resolved
and it looks like coredns servers are correctly getting set however they get IO timeouts based upon your log
*
* ==> coredns [ee10c2783373] <==
* .:53
[INFO] plugin/reload: Running configuration SHA512 = 7cdff32fc9c56df278621e3df8c1fd38e90c1c6357bf9c78282ddfe67ac8fc01159ee42f7229906198d471a617bf80a893de29f65c21937e1e5596cf6a48e762
CoreDNS-1.10.1
linux/amd64, go1.20, 055b2c3
[INFO] 127.0.0.1:47445 - 19077 "HINFO IN 3717226698632469066.6951563525464814206. udp 57 false 512" - - 0 6.003221699s
[ERROR] plugin/errors: 2 3717226698632469066.6951563525464814206. HINFO: read udp 10.244.0.157:37438->10.16.25.15:53: i/o timeout
So something networking - let's check this name server 10.16.25.15 by using the dnsutils pod described in the debugging link earlier. The below runs a few commands in it to check if DNS to one of Google's Public DNS nameservers 8.8.8.8 works then runs the same lookup using the local name server both locally (first two commands) and then in the minikube server. If you can provide that output we can see if it is networking issue that impacts external DNS server or if it might be something related to the networking between container - host -
dig archive.ubuntu.com @8.8.8.8
dig archive.ubuntu.com @10.16.25.15
sudo kubectl apply -f https://k8s.io/examples/admin/dns/dnsutils.yaml
sudo kubectl exec -ti dnsutils -- cat /etc/resolv.conf
sudo kubectl exec -ti dnsutils -- dig archive.ubuntu.com @8.8.8.8
sudo kubectl exec -ti dnsutils -- dig archive.ubuntu.com @10.16.25.15
Also need to eliminate host firewalling - on the host can you run the below and add the iptables.out as an attached file
sudo iptables -L n - v | tee iptables.out
sudo iptables -t nat -L -n -v | tee -a iptables.out
As well as sharing the output of sudo sysctl net.ipv4.ip_forward
My environment is different. I am using RHEL 8.8 and I am not using sysetmd-resolve. But for me I am pretty sure it is not a DNS issue it is network related as you mentioned and I think it is specific to using driver=none. Because I have the same behavior as this issue. If do not use the driver=none I can properly access the internet and any local machine on my local network. But if I run driver=none then all network connectivity outside the pod is timing out. I tried pinging using IP to bypass DSN.
From the machine hosting minikube to another machine on the local network
$ ping -c 3 10.253.16.228 PING 10.253.16.228 (10.253.16.228) 56(84) bytes of data. 64 bytes from 10.253.16.228: icmp_seq=1 ttl=64 time=0.992 ms 64 bytes from 10.253.16.228: icmp_seq=2 ttl=64 time=0.317 ms 64 bytes from 10.253.16.228: icmp_seq=3 ttl=64 time=0.472 ms
--- 10.253.16.228 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2037ms rtt min/avg/max/mdev = 0.317/0.593/0.992/0.290 ms
from the dnsutils pod to same machine on the local network
root@dnsutils:/$ ping -c 5 10.253.16.228 PING 10.253.16.228 (10.253.16.228) 56(84) bytes of data.
--- 10.253.16.228 ping statistics --- 5 packets transmitted, 0 received, 100% packet loss, time 4131ms
from the dnsutils pod to the host hosting minikube
root@dnsutils:/$ ping -c 3 10.253.16.47 PING 10.253.16.47 (10.253.16.47) 56(84) bytes of data. 64 bytes from 10.253.16.47: icmp_seq=1 ttl=64 time=0.083 ms 64 bytes from 10.253.16.47: icmp_seq=2 ttl=64 time=0.056 ms 64 bytes from 10.253.16.47: icmp_seq=3 ttl=64 time=0.051 ms
Could it be something related to the iptables?
The first one has no output $ iptables -L n -v | tee iptables.out iptables: No chain/target/match by that name.
The second one has a lot of output. I'll attach mine. megazone-iptables.txt
@megazone23 looking at your iptables you have multiple CNI bridges. Can you attach minikube logs as this is likely a different issue.
Can you share what's in your CNI config as I see multiple bridges in the iptables output.
find /etc/cni/net.d/ -type f -print -exec cat '{}' \;
ip link show
Minikube will create a bridge /etc/cni/net.d/1-k8s.conflist
and disable other bridges, however it makes some assumptions about the naming of cni configs it disables https://github.com/kubernetes/minikube/blob/f29457b47e7f8352f92ce64cef4295599f938ba9/pkg/minikube/cni/cni.go#L252
Once you've uploaded the minikube logs my suggestion to eliminate iptables itself, if you are comfortable with, would be to delete minikube, remove the minikube generated cni bridge config, turn off firewalling then rerun minikube start --driver=none
and test network connectivity from the container again as root/or using sudo for the following commands:
minikube stop
minikube delete
rm -f /etc/cni/net.d/1-k8s.conflist
systemctl stop firewall
systemctl disable firewalld
iptables -F
iptables -X
minikube start --driver=none
find /etc/cni/net.d/ -type f -print -exec cat '{}' \;
/etc/cni/net.d/1-k8s.conflist
{
"cniVersion": "0.3.1",
"name": "bridge",
"plugins": [
{
"type": "bridge",
"bridge": "bridge",
"addIf": "true",
"isDefaultGateway": true,
"forceAddress": false,
"ipMasq": true,
"hairpinMode": true,
"ipam": {
"type": "host-local",
"subnet": "10.244.0.0/16"
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
}
]
}
ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1460 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 42:01:0a:fd:10:2f brd ff:ff:ff:ff:ff:ff
altname enp0s4
altname ens4
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default
link/ether 02:42:24:8d:fd:9e brd ff:ff:ff:ff:ff:ff
4: bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 52:f0:6a:aa:e6:41 brd ff:ff:ff:ff:ff:ff
15: br-ada8c49844fe: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default
link/ether 02:42:57:ec:e2:14 brd ff:ff:ff:ff:ff:ff
34: veth6b106fb1@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master bridge state UP mode DEFAULT group default
link/ether 6a:83:a1:27:d4:69 brd ff:ff:ff:ff:ff:ff link-netnsid 0
I ran your suggestion disabling firewalld and clearing iptables
The problem is still there...
Ping to the minikube host machine from the pod works
ping -c 3 10.253.16.47
PING 10.253.16.47 (10.253.16.47): 56 data bytes
64 bytes from 10.253.16.47: seq=0 ttl=64 time=0.093 ms
64 bytes from 10.253.16.47: seq=1 ttl=64 time=0.057 ms
64 bytes from 10.253.16.47: seq=2 ttl=64 time=0.114 ms
--- 10.253.16.47 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.057/0.088/0.114 ms
Ping to machines on the same subnet as the host machine from the pod fails
ping -c 3 10.253.16.228
PING 10.253.16.228 (10.253.16.228): 56 data bytes
--- 10.253.16.228 ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss
DNS lookups in the pod using coredns as the dns server fails connecting to 10.96.0.10
cat /etc/resolv.conf
nameserver 10.96.0.10
nslookup www.google.com
;; connection timed out; no servers could be reached
From the pod the default route is set to 10.244.0.1, this is pingable.
route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.244.0.1 0.0.0.0 UG 0 0 0 eth0
10.244.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0
ping -c 3 10.244.0.1
PING 10.244.0.1 (10.244.0.1): 56 data bytes
64 bytes from 10.244.0.1: seq=0 ttl=64 time=0.088 ms
64 bytes from 10.244.0.1: seq=1 ttl=64 time=0.061 ms
64 bytes from 10.244.0.1: seq=2 ttl=64 time=0.055 ms
--- 10.244.0.1 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.055/0.068/0.088 ms
It seems like 10.244.0.1 is not routing the packets.
Hmm @megazone23 a few things in the logs stand out to me. R
Firstly the name server that coredns is trying to speak to is an address in the link local address block which I don't think would work.
Rather than trying ping (as ICMP can be filtered by other things) can you try DNS explicitlly dig www.google.com @8.8.8.8
or nslookup www.google.com 8.8.8.8
both on the host and on the dnsutil pod? Also what does /etc/resolv.conf on the host show?
[INFO] 127.0.0.1:59064 - 23587 "HINFO IN 693605401090401844.1612606921130871668. udp 56 false 512" - - 0 6.00191222s
[ERROR] plugin/errors: 2 693605401090401844.1612606921130871668. HINFO: read udp 10.244.0.34:33972->169.254.169.254:53: i/o timeout
@megazone23 can you also confirm if forwarding is setup as root sysctl -a --pattern forward
I am running a compute machine in Google Cloud. External DNS servers such as 8.8.8.8 are blocked. The DNS for the host VM is using the google metadata DNS server. I could try to switch to a DNS server which is on the corporate network.
Host resolv.conf
cat /etc/resolv.conf
search us-west1-b.c.laod001-epg-autosys01.internal us-west1-a.c.laod001-epg-autosys01.internal us-west1-c.c.laod001-epg-autosys01.internal c.laod001-epg-autosys01.internal google.internal
nameserver 169.254.169.254
nslookup 169.254.169.254
254.169.254.169.in-addr.arpa name = metadata.google.internal.
Authoritative answers can be found from:
sysctl -a --pattern forward
net.ipv4.conf.all.bc_forwarding = 0
net.ipv4.conf.all.forwarding = 1
net.ipv4.conf.all.mc_forwarding = 0
net.ipv4.conf.br-ada8c49844fe.bc_forwarding = 0
net.ipv4.conf.br-ada8c49844fe.forwarding = 1
net.ipv4.conf.br-ada8c49844fe.mc_forwarding = 0
net.ipv4.conf.bridge.bc_forwarding = 0
net.ipv4.conf.bridge.forwarding = 1
net.ipv4.conf.bridge.mc_forwarding = 0
net.ipv4.conf.default.bc_forwarding = 0
net.ipv4.conf.default.forwarding = 1
net.ipv4.conf.default.mc_forwarding = 0
net.ipv4.conf.docker0.bc_forwarding = 0
net.ipv4.conf.docker0.forwarding = 1
net.ipv4.conf.docker0.mc_forwarding = 0
net.ipv4.conf.eth0.bc_forwarding = 0
net.ipv4.conf.eth0.forwarding = 1
net.ipv4.conf.eth0.mc_forwarding = 0
net.ipv4.conf.lo.bc_forwarding = 0
net.ipv4.conf.lo.forwarding = 1
net.ipv4.conf.lo.mc_forwarding = 0
net.ipv4.conf.veth391bdbe5.bc_forwarding = 0
net.ipv4.conf.veth391bdbe5.forwarding = 1
net.ipv4.conf.veth391bdbe5.mc_forwarding = 0
net.ipv4.conf.veth8f42059a.bc_forwarding = 0
net.ipv4.conf.veth8f42059a.forwarding = 1
net.ipv4.conf.veth8f42059a.mc_forwarding = 0
net.ipv4.ip_forward = 1
net.ipv4.ip_forward_update_priority = 1
net.ipv4.ip_forward_use_pmtu = 0
net.ipv6.conf.all.forwarding = 0
net.ipv6.conf.all.mc_forwarding = 0
net.ipv6.conf.br-ada8c49844fe.forwarding = 0
net.ipv6.conf.br-ada8c49844fe.mc_forwarding = 0
net.ipv6.conf.bridge.forwarding = 0
net.ipv6.conf.bridge.mc_forwarding = 0
net.ipv6.conf.default.forwarding = 0
net.ipv6.conf.default.mc_forwarding = 0
net.ipv6.conf.docker0.forwarding = 0
net.ipv6.conf.docker0.mc_forwarding = 0
net.ipv6.conf.eth0.forwarding = 0
net.ipv6.conf.eth0.mc_forwarding = 0
net.ipv6.conf.lo.forwarding = 0
net.ipv6.conf.lo.mc_forwarding = 0
net.ipv6.conf.veth391bdbe5.forwarding = 0
net.ipv6.conf.veth391bdbe5.mc_forwarding = 0
net.ipv6.conf.veth8f42059a.forwarding = 0
net.ipv6.conf.veth8f42059a.mc_forwarding = 0
Switching to the corporate DNS looks like has the same problem are there some address ranges which are not useable?
*
* ==> coredns [a17364755252] <==
* .:53
[INFO] plugin/reload: Running configuration SHA512 = 7cdff32fc9c56df278621e3df8c1fd38e90c1c6357bf9c78282ddfe67ac8fc01159ee42f7229906198d471a617bf80a893de29f65c21937e1e5596cf6a48e762
CoreDNS-1.10.1
linux/amd64, go1.20, 055b2c3
[INFO] 127.0.0.1:48160 - 45217 "HINFO IN 7738483543626944269.2391439177860945490. udp 57 false 512" - - 0 6.002724141s
[ERROR] plugin/errors: 2 7738483543626944269.2391439177860945490. HINFO: read udp 10.244.0.39:34165->192.19.189.10:53: i/o timeout
[INFO] 127.0.0.1:47436 - 64283 "HINFO IN 7738483543626944269.2391439177860945490. udp 57 false 512" - - 0 6.002154482s
[ERROR] plugin/errors: 2 7738483543626944269.2391439177860945490. HINFO: read udp 10.244.0.39:38170->192.19.189.10:53: i/o timeout
[INFO] 127.0.0.1:46701 - 9896 "HINFO IN 7738483543626944269.2391439177860945490. udp 57 false 512" - - 0 4.001128598s
[ERROR] plugin/errors: 2 7738483543626944269.2391439177860945490. HINFO: read udp 10.244.0.39:35156->192.19.189.10:53: i/o timeout
[INFO] 127.0.0.1:42905 - 640 "HINFO IN 7738483543626944269.2391439177860945490. udp 57 false 512" - - 0 2.000885794s
[ERROR] plugin/errors: 2 7738483543626944269.2391439177860945490. HINFO: read udp 10.244.0.39:45787->192.19.189.10:53: i/o timeout
Does dig www.google.com @192.19.189.10
work from the host?
Yes
dig www.google.com @192.19.189.10
; <<>> DiG 9.11.36-RedHat-9.11.36-8.el8_8.2 <<>> www.google.com @192.19.189.10
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 64787
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1220
; COOKIE: 7bcc48e7e4be30d1944d56f7653800313f4906a809fa8c4a (good)
;; QUESTION SECTION:
;www.google.com. IN A
;; ANSWER SECTION:
www.google.com. 84 IN A 142.250.191.36
;; Query time: 48 msec
;; SERVER: 192.19.189.10#53(192.19.189.10)
;; WHEN: Tue Oct 24 17:34:41 UTC 2023
;; MSG SIZE rcvd: 87
Running the same command from the dnsutils pod fails.
root@dnsutils:/# dig www.google.com @192.19.189.10
; <<>> DiG 9.9.5-9+deb8u19-Debian <<>> www.google.com @192.19.189.10
;; global options: +cmd
;; connection timed out; no servers could be reached
Ok so the name server is only part of the problem - I believe (but will need to check) that as the coredns pod link is a virtual ethernet device the the link local address - I can replicate that initial failing with the 169 network setting up a reel instance in GCP along with failing to query 8.8.8.8 which works from the host.
Now I've an environment that reproducesI'll do a bit of debugging on the network side to see if I can see what's preventing the pods out of the box which will be a bit quicker turnaround than asking you to run commands
https://gist.github.com/pnasrat/96612d4cf7670232e38bd8645e527862
Added some iptables logging as the forward chain seemed to be getting drops
[ 4471.431919] IPTables-Dropped: IN=bridge OUT=eth0 PHYSIN=veth99fa30a1 MAC=12:71:03:47:74:58:c2:da:e3:c8:65:60:08:00 SRC=10.244.0.7 DST=8.8.8.8 LEN=45 TOS=0x00 PREC=0x00 TTL=63 ID=46447 DF PROTO=UDP SPT=51202 DPT=53 LEN=25
[ 4472.933063] IPTables-Dropped: IN=bridge OUT=eth0 PHYSIN=veth99fa30a1 MAC=12:71:03:47:74:58:c2:da:e3:c8:65:60:08:00 SRC=10.244.0.7 DST=8.8.8.8 LEN=45 TOS=0x00 PREC=0x00 TTL=63 ID=64070 DF PROTO=UDP SPT=57847 DPT=53 LEN=25
[ 4474.434227] IPTables-Dropped: IN=bridge OUT=eth0 PHYSIN=veth99fa30a1 MAC=12:71:03:47:74:58:c2:da:e3:c8:65:60:08:00 SRC=10.244.0.7 DST=8.8.8.8 LEN=45 TOS=0x00 PREC=0x00 TTL=63 ID=24313 DF PROTO=UDP SPT=49151 DPT=53 LEN=25
[ 4475.935708] IPTables-Dropped: IN=bridge OUT=eth0 PHYSIN=veth99fa30a1 MAC=12:71:03:47:74:58:c2:da:e3:c8:65:60:08:00 SRC=10.244.0.7 DST=8.8.8.8 LEN=45 TOS=0x00 PREC=0x00 TTL=63 ID=21525 DF PROTO=UDP SPT=41399 DPT=53 LEN=25
[ 4477.436533] IPTables-Dropped: IN=bridge OUT=eth0 PHYSIN=veth99fa30a1 MAC=12:71:03:47:74:58:c2:da:e3:c8:65:60:08:00 SRC=10.244.0.7 DST=8.8.8.8 LEN=45 TOS=0x00 PREC=0x00 TTL=63 ID=27329 DF PROTO=UDP SPT=40441 DPT=53 LEN=25
[ 4501.454906] IPTables-Dropped: IN=bridge OUT=eth0 PHYSIN=veth99fa30a1 MAC=12:71:03:47:74:58:c2:da:e3:c8:65:60:08:00 SRC=10.244.0.7 DST=8.8.8.8 LEN=45 TOS=0x00 PREC=0x00 TTL=63 ID=65434 DF PROTO=UDP SPT=38436 DPT=53 LEN=25
Thanks for the help so far. I assume the original reporter of this issue is having the same problem. Which seems to be network related.
@megazone23 can you see if adding the following iptables rules for the bridge interface allow the pod to talk to the network. I'm pretty sure it's the FORWARD chain that is denying the pod networking work around and this will temporarily work around while I get a better understanding of what the rules should be, what the security implications are, and what might need adjusting to fix correctly.
As root on the host
iptables -A FORWARD -o bridge -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
iptables -A FORWARD -i bridge -j ACCEPT
With this I can dig using the dnsutils pod
kubectl exec -ti dnsutils -- /bin/sh
# dig google.com
; <<>> DiG 9.9.5-9+deb8u19-Debian <<>> google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 3714
;; flags: qr rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;google.com. IN A
;; ANSWER SECTION:
google.com. 30 IN A 74.125.20.138
google.com. 30 IN A 74.125.20.113
google.com. 30 IN A 74.125.20.100
google.com. 30 IN A 74.125.20.139
google.com. 30 IN A 74.125.20.101
google.com. 30 IN A 74.125.20.102
;; Query time: 36 msec
;; SERVER: 10.96.0.10#53(10.96.0.10)
;; WHEN: Tue Oct 24 19:39:50 UTC 2023
;; MSG SIZE rcvd: 195
/kind support
The underlying cause here seems to be that docker libnetwork sets the FORWARD
iptables chain policy to DROP - which breaks the kubeadm generated rules, this is not new behavior (cf this debian issue]
If using docker + cri-dockerd + minikube none-driver this is probably a common issue. There are a number of workarounds, but the none driver is explicitly documented for advanced users and unlike the minikube iso images the variation of configurations is large.
I am happy to add some additional documentation/links about network debugging in general, and noting this potential issue explicitly for the none driver.
@pnasrat You are a genius! I ran the two iptables commands and the network issue is corrected DNS lookups are working and ping is also successful.
Thanks!
@pnasrat
iptables -A FORWARD -o bridge -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
iptables -A FORWARD -i bridge -j ACCEPT
After trying out the above two commands, I was able to resolve the issue with minikube --driver=none
but unfortunately, something broke for the minikube --driver=docker
and it is no longer able to pull images.
Trying to create a pod for example the dnsutil
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
dnsutils 0/1 ContainerCreating 0 5m2s
The system remains stuck at the ContainerCreating
step.
$ kubectl describe pod dnsutils
Name: dnsutils
Namespace: default
Priority: 0
Node: minikube/192.168.49.2
Start Time: Wed, 25 Oct 2023 16:34:12 +0530
Labels: <none>
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Containers:
dnsutils:
Container ID:
Image: registry.k8s.io/e2e-test-images/jessie-dnsutils:1.3
Image ID:
Port: <none>
Host Port: <none>
Command:
sleep
infinity
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-lkp4r (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-lkp4r:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 6m6s default-scheduler Successfully assigned default/dnsutils to minikube
Normal Pulling 6m6s kubelet Pulling image "registry.k8s.io/e2e-test-images/jessie-dnsutils:1.3"
I guess something bad happened. I tried removing the above two rules:
iptables -D FORWARD -o bridge -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
iptables -D FORWARD -i bridge -j ACCEPT
But still no luck.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
dnsutils 1/1 Running 0 4m37s
The pod starts, but it takes a lot of time to come into the running
state. Before adding that two rules and using --drive=docker option, this issue wasn't there. Even in the --driver=none
option, it does not take this long for the pod to reach running
state.
@Abhishekghosh1998 my personal recommendation would be to use --driver=docker
unless you have explicit need to use --driver=none
. I'm not sure switching between none driver and others is a recommended path, and is one that might need manipulating your host system in some ways.
One way to reset your ubuntu firewall (do this after minikube stop and delete) is to stop docker and ufw to prevent manipulation of firewall rules, flush the rules, reset the default FORWARD policy, zero the iptables counters. This may impact other software that manipulates iptables rules (eg libvirt).
sudo minikube stop
sudo minikube delete
sudo systemctl stop ufw docker
sudo iptables -F
sudo iptables -F -t nat
sudo iptables -F -t mangle
sudo iptables -P FORWARD ACCEPT
sudo iptables -Z
sudo iptables -Z -t nat
sudo iptables -Z -t mangle
As you are the sysadmin of your server I'm not sure if you want to run with ufw or not but that's your decision.
sudo rm -f /etc/cni/net.d/1-k8s.conflist
for f in /etc/cni/net.d/*.mk_disabled; do sudo mv "${f}" "${f%%.mk_disabled}" ; done
sudo systemctl start docker
Then rerun minikube start --driver=docker
in the way you would normally.
@pnasrat Please can you help me once. Even after trying the above methods of firewall reset, I get issue that the container creation takes a long time, which wasn't there before, when using the --driver=docker
I1025 18:18:01.900151 136607 network_create.go:284] error running [docker network inspect minikube]: docker network inspect minikube: exit status 1
stdout:
[]
stderr:
Error response from daemon: network minikube not found
I1025 18:18:01.900165 136607 network_create.go:286] output of [docker network inspect minikube]: -- stdout --
[]
The logs contain this error. log.txt
Also while trying to execute the below command, I get the following:
for f in /etc/cni/net.d/*.mk_disabled; do sudo mv "${f}" "${f%%.mk_disabled}" ; done
mv: cannot stat '/etc/cni/net.d/*.mk_disabled': No such file or directory
@Abhishekghosh1998 are you running minikube start --driver=docker
as your user (ie without sudo) in which case my instructions wouldn't necessarily work. This is as your previous minikube run with the docker driver is setup and I don't think minikube is recreating the docker network (will check the code).
sudo docker network ls
sudo minikube stop
sudo minikube delete --all --purge
minikube stop
minikube delete --all --purge
minikube start --driver=docker
docker network ls
@pnasrat yes, I am running minikube start --driver=docker
without sudo.
@Abhishekghosh1998 please follow the above steps to purge your minikube profiles and reset the network and your user minikube profile
https://github.com/kubernetes/minikube/issues/17442#issuecomment-1779245823
@pnasrat
$ docker network ls
NETWORK ID NAME DRIVER SCOPE
2f58cfe9e4c9 bridge bridge local
a0d41daaa95b host host local
1bd708f15ee8 kind bridge local
83cf432d58df minikube bridge local
4be973dd3c01 none null local
But still in the logs I find the above error:
I1025 18:50:25.759765 30519 network_create.go:284] error running [docker network inspect minikube]: docker network inspect minikube: exit status 1
stdout:
[]
stderr:
Error response from daemon: network minikube not found
I1025 18:50:25.759771 30519 network_create.go:286] output of [docker network inspect minikube]: -- stdout --
[]
-- /stdout --
** stderr **
Error response from daemon: network minikube not found
I have tried the steps which you said in https://github.com/kubernetes/minikube/issues/17442#issuecomment-1779245823
Why is that error coming? Pulling the docker images for the first time seems to be taking a lot of time... log.txt
So long as the minikube is up and running, and docker images are pulling and eventually running (the slowness is likely just as it needs to pull images off the internet) I don't think there is necessarily anything wrong the error is what happens when there is no minikube
network and if it doesn't exist minikube will go on to create it. It does not reflect an actual error.
I see this on my ubuntu system I did minikube stop; minikube delete --purge --all
then my starting point has no minikube network
docker network list
NETWORK ID NAME DRIVER SCOPE
62ae09eecd76 bridge bridge local
1e762a9bc9f3 host host local
7da388b6d6cc kubetracing_default bridge local
b207c08b8969 none null local
If I then run minikube start --driver=docker
and then minikube logs
my logs contain the error you include but importantly if you examin below that section you can see it does the creation in the below snippetand you can check that with docker network ls | grep minikube
I1025 10:18:08.149077 190591 network_create.go:123] attempt to create docker network minikube 192.168.49.0/24 with gateway 192.168.49.1 and MTU of 1500 ...
I1025 10:18:08.149114 190591 cli_runner.go:164] Run: docker network create --driver=bridge --subnet=192.168.49.0/24 --gateway=192.168.49.1 -o --ip-masq -o --icc -o com.docker.network.driver.mtu=1500 --label=created_by.minikube.sigs.k8s.io=true --label=name.minikube.sigs.k8s.io=minikube minikube
``` I1025 10:18:08.120728 190591 cli_runner.go:164] Run: docker network inspect minikube --format "{"Name": "{{.Name}}","Driver": "{{.Driver}}","Subnet": "{{range .IPAM.Config}}{{.Subnet}}{{end}}","Gateway": "{{range .IPAM.Config}}{{.Gateway}}{{end}}","MTU": {{if (index .Options "com.docker.network.driver.mtu")}}{{(index .Options "com.docker.network.driver.mtu")}}{{else}}0{{end}}, "ContainerIPs": [{{range $k,$v := .Containers }}"{{$v.IPv4Address}}",{{end}}]}" W1025 10:18:08.131898 190591 cli_runner.go:211] docker network inspect minikube --format "{"Name": "{{.Name}}","Driver": "{{.Driver}}","Subnet": "{{range .IPAM.Config}}{{.Subnet}}{{end}}","Gateway": "{{range .IPAM.Config}}{{.Gateway}}{{end}}","MTU": {{if (index .Options "com.docker.network.driver.mtu")}}{{(index .Options "com.docker.network.driver.mtu")}}{{else}}0{{end}}, "ContainerIPs": [{{range $k,$v := .Containers }}"{{$v.IPv4Address}}",{{end}}]}" returned with exit code 1 I1025 10:18:08.131935 190591 network_create.go:281] running [docker network inspect minikube] to gather additional debugging logs... I1025 10:18:08.131940 190591 cli_runner.go:164] Run: docker network inspect minikube W1025 10:18:08.140259 190591 cli_runner.go:211] docker network inspect minikube returned with exit code 1 I1025 10:18:08.140269 190591 network_create.go:284] error running [docker network inspect minikube]: docker network inspect minikube: exit status 1 stdout: [] stderr: Error response from daemon: network minikube not found I1025 10:18:08.140275 190591 network_create.go:286] output of [docker network inspect minikube]: -- stdout -- [] -- /stdout -- ** stderr ** Error response from daemon: network minikube not found ** /stderr ** I1025 10:18:08.140309 190591 cli_runner.go:164] Run: docker network inspect bridge --format "{"Name": "{{.Name}}","Driver": "{{.Driver}}","Subnet": "{{range .IPAM.Config}}{{.Subnet}}{{end}}","Gateway": "{{range .IPAM.Config}}{{.Gateway}}{{end}}","MTU": {{if (index .Options "com.docker.network.driver.mtu")}}{{(index .Options "com.docker.network.driver.mtu")}}{{else}}0{{end}}, "ContainerIPs": [{{range $k,$v := .Containers }}"{{$v.IPv4Address}}",{{end}}]}" I1025 10:18:08.149046 190591 network.go:209] using free private subnet 192.168.49.0/24: &{IP:192.168.49.0 Netmask:255.255.255.0 Prefix:24 CIDR:192.168.49.0/24 Gateway:192.168.49.1 ClientMin:192.168.49.2 ClientMax:192.168.49.254 Broadcast:192.168.49.255 IsPrivate:true Interface:{IfaceName: IfaceIPv4: IfaceMTU:0 IfaceMAC:} reservation:0xc001428b40} I1025 10:18:08.149077 190591 network_create.go:123] attempt to create docker network minikube 192.168.49.0/24 with gateway 192.168.49.1 and MTU of 1500 ... I1025 10:18:08.149114 190591 cli_runner.go:164] Run: docker network create --driver=bridge --subnet=192.168.49.0/24 --gateway=192.168.49.1 -o --ip-masq -o --icc -o com.docker.network.driver.mtu=1500 --label=created_by.minikube.sigs.k8s.io=true --label=name.minikube.sigs.k8s.io=minikube minikube I1025 10:18:08.229518 190591 network_create.go:107] docker network minikube 192.168.49.0/24 created ```
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
What Happened?
For some specific use cases, I want to start Minikube with the
--driver=none
option.All the pods are running fine:
Then I wrote a simple ubuntu-pod to check internet access
ubuntu-pod.yml:
Then I try to launch the terminal of that pod and try to check internet connectivity:
I understand when I use the
driver=none
in minikube, it makes use of the host system. Is the issue in DNS resolution due to the host machine? I am not sure. But the internet works fine on my host machine.When I remove the
--driver=option
and do$ minikube start
and follow the above steps, the pods connects to internet just fine.Attach the log file
log.txt Please consider the last start...
Operating System
Ubuntu
Driver
None