k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
28.1k stars 2.35k forks source link

No pod network traffic outside the node #2260

Closed clambin closed 4 years ago

clambin commented 4 years ago

Environmental Info: K3s Version: k3s version v1.18.8+k3s1 (6b595318)

Node(s) CPU architecture, OS, and Version: Linux raspberrypi1 5.4.51-v7l+ rancher/k3s#1333 SMP Mon Aug 10 16:51:40 BST 2020 armv7l GNU/Linux

Cluster Configuration: Single node cluster

Describe the bug: DNS is not working inside a pod

Steps To Reproduce: Installed k3s on a single Raspberry PI (curl -sfL https://get.k3s.io | sh -) and applied the following deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: covid19mon
  labels:
   app: covid19mon
spec:
  replicas: 1
  selector:
    matchLabels:
      app: covid19mon
  template:
    metadata:
      labels:
        app: covid19mon
    spec:
      containers:
      - name: covid19mon
        args:
        - --apikey=XXXX
        - --interval=60
        - --postgres-host=192.168.0.10
        - --postgres-port=5432
        - --postgres-user=covid
        - --postgres-password=XXX
        image: clambin/covid19mon:develop
        imagePullPolicy: Always

After the pod started, I noticed that the application could not resolve a hostname:

Failed to establish a new connection: [Errno -3] Temporary failure in name resolution')

Checked the logs of the CoreDNS pod:

:53
[INFO] plugin/reload: Running configuration MD5 = 4665410bf21c8b272fcfd562c482cb82
CoreDNS-1.6.9
linux/arm, go1.14.1, 1766568
[ERROR] plugin/errors: 2 1019048027.2119947064. HINFO: read udp 10.42.0.4:39458->8.8.4.4:53: i/o timeout
[ERROR] plugin/errors: 2 1019048027.2119947064. HINFO: read udp 10.42.0.4:42042->8.8.4.4:53: i/o timeout
[ERROR] plugin/errors: 2 1019048027.2119947064. HINFO: read udp 10.42.0.4:41387->192.168.0.1:53: i/o timeout
[ERROR] plugin/errors: 2 1019048027.2119947064. HINFO: read udp 10.42.0.4:57497->192.168.0.1:53: i/o timeout
[ERROR] plugin/errors: 2 1019048027.2119947064. HINFO: read udp 10.42.0.4:32798->192.168.0.1:53: i/o timeout
[ERROR] plugin/errors: 2 1019048027.2119947064. HINFO: read udp 10.42.0.4:46654->192.168.0.1:53: i/o timeout
[ERROR] plugin/errors: 2 1019048027.2119947064. HINFO: read udp 10.42.0.4:54977->192.168.0.1:53: i/o timeout
[ERROR] plugin/errors: 2 1019048027.2119947064. HINFO: read udp 10.42.0.4:37331->192.168.0.1:53: i/o timeout
[ERROR] plugin/errors: 2 1019048027.2119947064. HINFO: read udp 10.42.0.4:37851->192.168.0.1:53: i/o timeout
[ERROR] plugin/errors: 2 1019048027.2119947064. HINFO: read udp 10.42.0.4:37673->192.168.0.1:53: i/o timeout

Those are the correct IP addresses, and all of these are reachable from the Pi. But it looks like network connectivity doesn't from inside the pod to out get out of the cluster?

Expected behavior: DNS should work.

Actual behavior: DNS doesn't work.

Additional context / logs: Fresh install of k3s. No customisation.

Node info:

NAME           STATUS   ROLES    AGE   VERSION        INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                         KERNEL-VERSION   CONTAINER-RUNTIME
raspberrypi1   Ready    master   95m   v1.18.8+k3s1   192.168.0.11   <none>        Raspbian GNU/Linux 10 (buster)   5.4.51-v7l+      containerd://1.3.3-k3s2
clambin commented 4 years ago

More info: looks like no traffic is getting out of the node. My Pi's IP address is 192.168.0.11.

$ kubectl run -it --rm --restart=Never busybox --image=busybox:1.28 -- ping -c 3 192.168.0.11
If you don't see a command prompt, try pressing enter.
64 bytes from 192.168.0.11: seq=1 ttl=64 time=0.238 ms
64 bytes from 192.168.0.11: seq=2 ttl=64 time=0.291 ms

--- 192.168.0.11 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.238/0.305/0.386 ms
pod "busybox" deleted
$ kubectl run -it --rm --restart=Never busybox --image=busybox:1.28 -- ping -c 3 192.168.0.10
If you don't see a command prompt, try pressing enter.

--- 192.168.0.10 ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss
pod "busybox" deleted
pod default/busybox terminated (Error)
$ ping -c 3 192.168.0.10
PING 192.168.0.10 (192.168.0.10) 56(84) bytes of data.
64 bytes from 192.168.0.10: icmp_seq=1 ttl=64 time=0.331 ms
64 bytes from 192.168.0.10: icmp_seq=2 ttl=64 time=0.286 ms
64 bytes from 192.168.0.10: icmp_seq=3 ttl=64 time=0.343 ms

--- 192.168.0.10 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 97ms
rtt min/avg/max/mdev = 0.286/0.320/0.343/0.024 ms

Nothing fancy from a networking perspective. What could be causing this?

brandond commented 4 years ago

Do you have a local firewall configured - perhaps try ufw disable? Do you see any errors in the k3s or containerd logs?

clambin commented 4 years ago

No locally configured firewall, except for iptables, but it looks like k3s has configured that correctly:

$ sudo iptables-legacy -v -L
Chain INPUT (policy ACCEPT 47015 packets, 8292K bytes)
 pkts bytes target     prot opt in     out     source               destination
 774K  420M KUBE-FIREWALL  all  --  any    any     anywhere             anywhere
11576 2533K KUBE-SERVICES  all  --  any    any     anywhere             anywhere             ctstate NEW /* kubernetes service portals */
11580 2533K KUBE-EXTERNAL-SERVICES  all  --  any    any     anywhere             anywhere             ctstate NEW /* kubernetes externally-visible service portals */

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
39703 2006K KUBE-FORWARD  all  --  any    any     anywhere             anywhere             /* kubernetes forwarding rules */
39655 2003K KUBE-SERVICES  all  --  any    any     anywhere             anywhere             ctstate NEW /* kubernetes service portals */
30746 1468K ACCEPT     all  --  any    any     10.42.0.0/16         anywhere
    0     0 ACCEPT     all  --  any    any     anywhere             10.42.0.0/16

Chain OUTPUT (policy ACCEPT 45652 packets, 11M bytes)
 pkts bytes target     prot opt in     out     source               destination
 650K  186M KUBE-FIREWALL  all  --  any    any     anywhere             anywhere
10276  662K KUBE-SERVICES  all  --  any    any     anywhere             anywhere             ctstate NEW /* kubernetes service portals */

Chain KUBE-EXTERNAL-SERVICES (1 references)
 pkts bytes target     prot opt in     out     source               destination

Chain KUBE-FIREWALL (2 references)
 pkts bytes target     prot opt in     out     source               destination
    0     0 DROP       all  --  any    any     anywhere             anywhere             /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000
    0     0 DROP       all  --  any    any    !127.0.0.0/8          127.0.0.0/8          /* block incoming localnet connections */ ! ctstate RELATED,ESTABLISHED,DNAT

Chain KUBE-FORWARD (1 references)
 pkts bytes target     prot opt in     out     source               destination
    0     0 DROP       all  --  any    any     anywhere             anywhere             ctstate INVALID
    0     0 ACCEPT     all  --  any    any     anywhere             anywhere             /* kubernetes forwarding rules */ mark match 0x4000/0x4000
    0     0 ACCEPT     all  --  any    any     anywhere             anywhere             /* kubernetes forwarding conntrack pod source rule */ ctstate RELATED,ESTABLISHED
    0     0 ACCEPT     all  --  any    any     anywhere             anywhere             /* kubernetes forwarding conntrack pod destination rule */ ctstate RELATED,ESTABLISHED

Chain KUBE-KUBELET-CANARY (0 references)
 pkts bytes target     prot opt in     out     source               destination

Chain KUBE-PROXY-CANARY (0 references)
 pkts bytes target     prot opt in     out     source               destination

Chain KUBE-SERVICES (3 references)
 pkts bytes target     prot opt in     out     source               destination

Looking at the logfiles in /var/log/pods, I see plenty of these:

./kube-system_traefik-758cd5fc85-s5gbg_517c6460-74f5-41d9-8ba9-dd98b21bd8f2/traefik/0.log:2020-09-16T21:45:37.32936583+02:00 stdout F {"level":"warning","msg":"Error checking new version: Get https://update.traefik.io/repos/containous/traefik/releases: dial tcp: i/o timeout","time":"2020-09-16T19:45:37Z"}
./kube-system_coredns-7944c66d8d-csd4s_592a2ce7-bdb8-4880-9b31-e85819898f06/coredns/0.log:2020-09-16T21:34:47.17656514+02:00 stdout F [ERROR] plugin/errors: 2 753716345.928958693. HINFO: read udp 10.42.0.4:60234->8.8.4.4:53: i/o timeout
./kube-system_coredns-7944c66d8d-csd4s_592a2ce7-bdb8-4880-9b31-e85819898f06/coredns/0.log:2020-09-16T21:34:50.176195033+02:00 stdout F [ERROR] plugin/errors: 2 753716345.928958693. HINFO: read udp 10.42.0.4:47252->8.8.4.4:53: i/o timeout
./kube-system_coredns-7944c66d8d-csd4s_592a2ce7-bdb8-4880-9b31-e85819898f06/coredns/0.log:2020-09-16T21:34:53.17720611+02:00 stdout F [ERROR] plugin/errors: 2 753716345.928958693. HINFO: read udp 10.42.0.4:42298->192.168.0.1:53: i/o timeout
./kube-system_metrics-server-7566d596c8-xjtm8_b6ab040a-0fa9-4af6-bdb1-ce9f804aff1e/metrics-server/0.log:2020-09-16T22:08:15.288283+02:00 stderr F E0916 20:08:15.288103       1 manager.go:111] unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary:raspberrypi1: unable to fetch metrics from Kubelet raspberrypi1 (raspberrypi1): Get https://raspberrypi1:10250/stats/summary?only_cpu_and_memory=true: dial tcp: i/o timeout

All indicative of packets to be routed outside the host getting dropped ... somewhere.

brandond commented 4 years ago

I notice you used iptables-legacy. Is nftables your default? If so you might try switching, then rebooting:

update-alternatives --set iptables /usr/sbin/iptables-legacy
update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy
update-alternatives --set arptables /usr/sbin/arptables-legacy
update-alternatives --set ebtables /usr/sbin/ebtables-legacy
clambin commented 4 years ago

Actually, I used that command because on my Pi, both iptables and iptables-legacy have rules in them (?). iptables-legacy has the kube rules in them, so that's why I added them here.

Hmm .... so I decided to clear all the rules in iptables (not iptables-legacy) and now traffic is flowing. Pinging outside nodes, DNS lookups all working.

update-alternatives isn't installed on my version of raspbian. Will need to research it, though not clear why k3s added the rules to legacy rather than standard iptables.

brandond commented 4 years ago

1.19 will support nftables. 1.18 only supports iptables. Trying to mix iptables and nftables leads to some breakage (as you noticed).

clambin commented 4 years ago

Bingo! Switched back to legacy, rebooted and now traffic started flowing. Thanks!