k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
28.26k stars 2.36k forks source link

cluster networking is broken? #24

Closed liyimeng closed 5 years ago

liyimeng commented 5 years ago

helm install job never succeed, it seem that it is not possible to reach dns server.

alpine:/home/alpine/k3s/dist/artifacts# ./k3s kubectl  get all -n kube-system 
NAME                             READY   STATUS             RESTARTS   AGE
pod/coredns-7748f7f6df-tp7fq     1/1     Running            1          104m
pod/helm-install-traefik-g5rmk   0/1     CrashLoopBackOff   21         104m

NAME               TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
service/kube-dns   ClusterIP   10.43.0.10   <none>        53/UDP,53/TCP,9153/TCP   104m

NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/coredns   1/1     1            1           104m

NAME                                 DESIRED   CURRENT   READY   AGE
replicaset.apps/coredns-7748f7f6df   1         1         1       104m

NAME                             COMPLETIONS   DURATION   AGE
job.batch/helm-install-traefik   0/1           104m       104m

./k3s kubectl   -n kube-system logs -f pod/helm-install-traefik-g5rmk
+ export HELM_HOST=127.0.0.1:44134+ 
tiller --listen=127.0.0.1:44134 --storage=secret
+ HELM_HOST=127.0.0.1:44134
+ helm init --client-only
[main] 2019/02/08 20:48:52 Starting Tiller v2.12.3 (tls=false)
[main] 2019/02/08 20:48:52 GRPC listening on 127.0.0.1:44134
[main] 2019/02/08 20:48:52 Probes listening on :44135
[main] 2019/02/08 20:48:52 Storage driver is Secret
[main] 2019/02/08 20:48:52 Max history per release is 0
Creating /root/.helm 
Creating /root/.helm/repository 
Creating /root/.helm/repository/cache 
Creating /root/.helm/repository/local 
Creating /root/.helm/plugins 
Creating /root/.helm/starters 
Creating /root/.helm/cache/archive 
Creating /root/.helm/repository/repositories.yaml 
Adding stable repo with URL: https://kubernetes-charts.storage.googleapis.com 
Error: Looks like "https://kubernetes-charts.storage.googleapis.com" is not a valid chart repository or cannot be reached: Get https://kubernetes-charts.storage.googleapis.com/index.yaml: dial tcp: lookup kubernetes-charts.storage.googleapis.com on 10.43.0.10:53: read udp 10.42.0.4:39333->10.43.0.10:53: i/o timeout

Verify by running a busy box

alpine:/home/alpine/k3s/dist/artifacts# ./k83s kubectl run -i --tty busybox --image=busybox --restart=Never -- sh
ash: ./k83s: not found
alpine:/home/alpine/k3s/dist/artifacts# ./k3s kubectl run -i --tty busybox --image=busybox --restart=Never -- sh
If you don't see a command prompt, try pressing enter.
/ # 
/ # ping 10.43.0.10
PING 10.43.0.10 (10.43.0.10): 56 data bytes
^C
--- 10.43.0.10 ping statistics ---
7 packets transmitted, 0 packets received, 100% packet loss
/ # ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
3: eth0@if8: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue 
    link/ether 32:03:33:52:8c:19 brd ff:ff:ff:ff:ff:ff
    inet 10.42.0.6/24 brd 10.42.0.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::3003:33ff:fe52:8c19/64 scope link 
       valid_lft forever preferred_lft forever
/ # ping 10.43.0.10
PING 10.43.0.10 (10.43.0.10): 56 data bytes
^C
--- 10.43.0.10 ping statistics ---
6 packets transmitted, 0 packets received, 100% packet loss
/ # ping 10.42.0.6
PING 10.42.0.6 (10.42.0.6): 56 data bytes
64 bytes from 10.42.0.6: seq=0 ttl=64 time=0.109 ms
64 bytes from 10.42.0.6: seq=1 ttl=64 time=0.108 ms
64 bytes from 10.42.0.6: seq=2 ttl=64 time=0.106 ms
^C
--- 10.42.0.6 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.106/0.107/0.109 ms
ibuildthecloud commented 5 years ago

@liyimeng can you ensure the br_netfilter module is loaded. The agent is supposed to load this module but it seems to not always work. I'm troubleshooting that now.

ibuildthecloud commented 5 years ago

@liyimeng FYI, it you are running in a container you need bind mount in /lib/modules/$(uname -r):/lib/modules/$(uname -r):ro so that modules can be loaded

liyimeng commented 5 years ago

I do have it loaded

alpine:/home/alpine/k3s/dist/artifacts# lsmod | grep netfilter br_netfilter 20480 0 bridge 163840 1 br_netfilter

When I do a troubleshooting like this: https://kubernetes.io/docs/tasks/debug-application-cluster/debug-service/#running-commands-in-a-pod

` / # wget -O- hostnames Connecting to hostnames (10.43.129.144:80) hostnames-85bc9c579-rtr4r

** server can't find hostnames.default.svc.cluster.local: NXDOMAIN

Can't find hostnames.svc.cluster.local: No answer Can't find hostnames.cluster.local: No answer Can't find hostnames.default.svc.cluster.local: No answer Can't find hostnames.svc.cluster.local: No answer *** Can't find hostnames.cluster.local: No answer

/ # nslookup hostnames.default ;; connection timed out; no servers could be reached / # cat /etc/hosts

Kubernetes-managed hosts file.

127.0.0.1 localhost ::1 localhost ip6-localhost ip6-loopback fe00::0 ip6-localnet fe00::0 ip6-mcastprefix fe00::1 ip6-allnodes fe00::2 ip6-allrouters 10.42.0.7 busybox / # arp 10-42-0-9.hostnames.default.svc.cluster.local (10.42.0.9) at 96:12:b9:85:cf:5c [ether] on eth0 10-42-0-8.hostnames.default.svc.cluster.local (10.42.0.8) at 1e:87:4b:df:77:2a [ether] on eth0

? (10.42.0.1) at 4e:b8:bd:7b:10:7b [ether] on eth0 10-42-0-5.kube-dns.kube-system.svc.cluster.local (10.42.0.5) at ba:e7:34:61:0a:bc [ether] on eth0 10-42-0-10.hostnames.default.svc.cluster.local (10.42.0.10) at 4e:2b:cf:de:e2:9e [ether] on eth0 `

how strange it is, I actually can reach the hostnames service when run wget, but nslookup failed. I guess it is something wrong on forwarding the packet from pod to service, or wise verse.

Do we have tube-proxy or ipvs to map between service and pods?

liyimeng commented 5 years ago

OK, I see that we use kube-proxy, at least iptables for this. String enough, nslookup work on the host! just not inside the pods!

` nslookup www.google.com 10.43.0.10 Server: 10.43.0.10 Address 1: 10.43.0.10

Name: www.google.com Address 1: 216.58.207.196 arn11s04-in-f4.1e100.net Address 2: 2a00:1450:400e:809::2004 ams15s32-in-x04.1e100.net alpine:/home/alpine/k3s/dist/artifacts# nslookup hostnames.default.svc.cluster.local 10.43.0.10 Server: 10.43.0.10 Address 1: 10.43.0.10

nslookup: can't resolve 'hostnames.default.svc.cluster.local': Name does not resolve `

liyimeng commented 5 years ago

BTW, ip forwarding is on

alpine:/home/alpine/k3s/dist/artifacts# cat /proc/sys/net/ipv4/ip_forward 1

ibuildthecloud commented 5 years ago

@liyimeng is there any way I can reproduce your setup?

liyimeng commented 5 years ago

@ibuildthecloud Here is what I have done:

aaliddell commented 5 years ago

Seen the same issue when installing on an existing system. When running on a clean install, there are no issues.

After some testing, the issue appears to be in having existing iptables rules that have a default DROP policy on the INPUT. After setting the input policy to ACCEPT, the issue appears to be resolved. Therefore, either a note needs to be added to the docs or k3s needs to setup its own iptables rules on the INPUT chain to insure that traffic does not hit the default policy, which is usually DROP or REJECT for security reasons.

jose-sanchezm commented 5 years ago

In a fresh installation over CentOS 7.5 I'm getting the same issue:

# kubectl get pods --all-namespaces
NAMESPACE     NAME                         READY   STATUS             RESTARTS   AGE
default       ds4m-0                       1/1     Running            0          25m
kube-system   coredns-7748f7f6df-6j8kh     0/1     CrashLoopBackOff   9          25m
kube-system   helm-install-traefik-bprt6   0/1     CrashLoopBackOff   9          25m
# kubectl logs coredns-7748f7f6df-6j8kh --namespace kube-system
E0305 16:36:17.678604       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:322: Failed to list *v1.Namespace: Get https://10.43.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.43.0.1:443: connect: no route to host
E0305 16:36:19.680598       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:322: Failed to list *v1.Namespace: Get https://10.43.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.43.0.1:443: connect: no route to host
.:53
2019-03-05T16:36:21.676Z [INFO] CoreDNS-1.3.0
2019-03-05T16:36:21.676Z [INFO] linux/amd64, go1.11.4, c8f0e94
CoreDNS-1.3.0
linux/amd64, go1.11.4, c8f0e94
2019-03-05T16:36:21.676Z [INFO] plugin/reload: Running configuration MD5 = 3ef0d797df417f2c0375a4d1531511fb
E0305 16:36:23.686625       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:322: Failed to list *v1.Namespace: Get https://10.43.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.43.0.1:443: connect: no route to host
E0305 16:36:23.690587       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:315: Failed to list *v1.Service: Get https://10.43.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.43.0.1:443: connect: no route to host
# kubectl logs helm-install-traefik-bprt6 --namespace kube-system
+ export HELM_HOST=127.0.0.1:44134
+ HELM_HOST=127.0.0.1:44134
+ helm init --client-only
+ tiller --listen=127.0.0.1:44134 --storage=secret
[main] 2019/03/05 16:35:55 Starting Tiller v2.12.3 (tls=false)
[main] 2019/03/05 16:35:55 GRPC listening on 127.0.0.1:44134
[main] 2019/03/05 16:35:55 Probes listening on :44135
[main] 2019/03/05 16:35:55 Storage driver is Secret
[main] 2019/03/05 16:35:55 Max history per release is 0
Creating /root/.helm 
Creating /root/.helm/repository 
Creating /root/.helm/repository/cache 
Creating /root/.helm/repository/local 
Creating /root/.helm/plugins 
Creating /root/.helm/starters 
Creating /root/.helm/cache/archive 
Creating /root/.helm/repository/repositories.yaml 
Adding stable repo with URL: https://kubernetes-charts.storage.googleapis.com 
Error: Looks like "https://kubernetes-charts.storage.googleapis.com" is not a valid chart repository or cannot be reached: Get https://kubernetes-charts.storage.googleapis.com/index.yaml: dial tcp: lookup kubernetes-charts.storage.googleapis.com on 10.43.0.10:53: read udp 10.42.1.2:48204->10.43.0.10:53: read: no route to host

My firewalld configuration:

# firewall-cmd --list-all
public (active)
  target: default
  icmp-block-inversion: no
  interfaces: eth0
  sources: 
  services: dhcpv6-client ssh
  ports: 4789/udp 6443/tcp
  protocols: 
  masquerade: no
  forward-ports: 
  source-ports: 
  icmp-blocks: 
  rich rules: 

br_netfilter module is loaded:

# lsmod | grep netfilter
br_netfilter           22256  0 
bridge                151336  2 br_netfilter,ebtable_broute

Which extra rules do I have to configure to get it working?

aaliddell commented 5 years ago

I've not used firewalld before, but essentially you need to add a rule equivalent to this iptables rule:

iptables -I INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT

This rule says, permit incoming packets from interface (bridge) cni0, with source in range 10.42.0.0/16. If you wanted to be more granular, you could individually open each port, rather than accept any.

jose-sanchezm commented 5 years ago

I've added the rule and core-dns logs way less errors (although there are still some) but helm-install-traefik continues crashing continuously with the same error. Do I need another rule for it?

aaliddell commented 5 years ago

Does firewalld have a log somewhere of what packets it is blocking or a way to enable such a log? If so, look there to see what might still be getting dropped.

briandealwis commented 5 years ago

I'm seeing this with VMWare Photon 3.0. Adding @aaliddell's snippet to /etc/systemd/scripts/ip4save has done the trick.

sahlex commented 5 years ago

@briandealwis can you please point out which exact snippet you are referring to?

briandealwis commented 5 years ago

This comment:

iptables -I INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT

sahlex commented 5 years ago

I'm having the same problems.

From inside a pod, (busybox) the dns is configured as 10.43.x.x but no interface of that name is created. I start the server with

/usr/local/bin/k3s server --cluster-cidr 10.10.0.0/16

without disabling the coredns service. But my machine shows no interface with range 10.43.x.x:

cni0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.10.0.1  netmask 255.255.255.0  broadcast 10.10.0.255
        ether b6:05:a1:65:5e:49  txqueuelen 1000  (Ethernet)
        RX packets 1990  bytes 181178 (176.9 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 1312  bytes 131217 (128.1 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.42.2.112  netmask 255.255.0.0  broadcast 10.42.255.255
        ether 00:15:5d:01:d2:21  txqueuelen 1000  (Ethernet)
        RX packets 744576  bytes 380910420 (363.2 MiB)
        RX errors 0  dropped 6  overruns 0  frame 0
        TX packets 71389  bytes 49850951 (47.5 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.10.0.0  netmask 255.255.255.255  broadcast 10.10.0.0
        ether 1e:41:71:d1:92:ff  txqueuelen 0  (Ethernet)
        RX packets 26  bytes 1944 (1.8 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 14  bytes 888 (888.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        loop  txqueuelen 1000  (Lokale Schleife)
        RX packets 632239  bytes 225761419 (215.3 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 632239  bytes 225761419 (215.3 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
aaliddell commented 5 years ago

The 10.43.0.0/16 range is the default service ClusterIP range, which isn't actually bound to any interface but is instead a 'virtual' ip that is routed by iptables to a pod backing the service: https://kubernetes.io/docs/concepts/services-networking/service/#proxy-mode-iptables

You can change the service ip range with --service-cidr: https://github.com/rancher/k3s/pull/171

sahlex commented 5 years ago

Thanks for your responses!

I added my cidr network to the firewall (accept rules show up there).

Still, after uninstalling k3s and reinstalling with

./install-k3s.sh --cluster-cidr=10.10.0.0/16

I still get errors related to DNS.

After the startup I tail the logs from coredns doing a kubectl logs -f coredns-7748f7f6df-6klwg -n kube-system:

[root@h20181152922 ~]# kubectl logs -f coredns-7748f7f6df-6klwg -n kube-system
.:53
2019-03-13T07:45:33.023Z [INFO] CoreDNS-1.3.0
2019-03-13T07:45:33.023Z [INFO] linux/amd64, go1.11.4, c8f0e94
CoreDNS-1.3.0
linux/amd64, go1.11.4, c8f0e94
2019-03-13T07:45:33.023Z [INFO] plugin/reload: Running configuration MD5 = 3ef0d797df417f2c0375a4d1531511fb
2019-03-13T07:45:54.026Z [ERROR] plugin/errors: 2 7066956279340311327.7933564162591032888. HINFO: unreachable backend: read udp 10.10.0.20:47160->1.1.1.1:53: i/o timeout
2019-03-13T07:45:57.025Z [ERROR] plugin/errors: 2 7066956279340311327.7933564162591032888. HINFO: unreachable backend: read udp 10.10.0.20:46236->1.1.1.1:53: i/o timeout

From the server logs:

Mar 13 08:50:47 docker2 k3s: time="2019-03-13T08:50:47.568418341+01:00" level=info msg="Running kubelet --healthz-bind-address 127.0.0.1 --read-only-port 0 --allow-privileged=true --cluster-domain cluster.local --kubeconfig /var/lib/rancher/k3s/agent/kubeconfig.yaml --eviction-hard imagefs.available<5%,nodefs.available<5% --eviction-minimum-reclaim imagefs.available=10%,nodefs.available=10% --fail-swap-on=false --cgroup-driver cgroupfs --root-dir /var/lib/rancher/k3s/agent/kubelet --cert-dir /var/lib/rancher/k3s/agent/kubelet/pki --seccomp-profile-root /var/lib/rancher/k3s/agent/kubelet/seccomp --cni-conf-dir /var/lib/rancher/k3s/agent/etc/cni/net.d --cni-bin-dir /var/lib/rancher/k3s/data/e44f7a46cadac4cec9a759756f2a27fdb25e705a83d8d563207c6a6c5fa368b4/bin --cluster-dns 10.43.0.10 --container-runtime remote --container-runtime-endpoint unix:///run/k3s/containerd/containerd.sock --address 127.0.0.1 --anonymous-auth=false --client-ca-file /var/lib/rancher/k3s/agent/client-ca.pem --hostname-override h20181152922 --cpu-cfs-quota=false --runtime-cgroups /systemd/system.slice --kubelet-cgroups /systemd/system.slice"

WhenI try to do a nslookup from buysbox:

[root@h20181152922 k3s]# /usr/local/bin/k3s kubectl run -i --tty busybox --image=busybox --restart=Never -- sh
If you don't see a command prompt, try pressing enter.
/ # nslookup www.google.de
;; connection timed out; no servers could be reached

/ # cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local haba.int
nameserver 10.43.0.10
options ndots:5
/ # ping 10.43.0.10
PING 10.43.0.10 (10.43.0.10): 56 data bytes
^C
--- 10.43.0.10 ping statistics ---
4 packets transmitted, 0 packets received, 100% packet loss

So it seems DNS is not working properly...

aaliddell commented 5 years ago

Pings to 10.43.0.0/16 addresses aren't going to respond, due to them being 'virtual' and only really existing within iptables. If your DNS requests are getting to the CoreDNS pod, then cluster networking looks like it's working. Your issue may be related to https://github.com/rancher/k3s/issues/53 (how on earth did a DNS problem get issue number 53...)

sahlex commented 5 years ago

In fact the coredns isn't able to reach out to 1.1.1.1:53. I changed it according to #53 and now its working!!

Thanks!

BTW: you're right on the issue number. What a nice coincidence!

odensc commented 5 years ago

To anyone running into this issue on Fedora, the proper command to add the iptables rule is:

firewall-cmd --permanent --direct --add-rule ipv4 filter INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT

and then a firewall-cmd --reload fixed it for me.

Still having issues with DNS resolving though.

xykonur commented 5 years ago

Fedora 29 these fixed both CoreDNS and Traefik install for me:

firewall-cmd --permanent --direct --add-rule ipv4 filter INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT
firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 1 -s 10.42.0.0/15 -j ACCEPT
firewall-cmd --reload

Might be possible to further narrow down or optimise the /15.

gdhgdhgdh commented 5 years ago

Fedora 29 these fixed both CoreDNS and Traefik install for me:

Perfect timing! This worked like a charm for me on CentOS 7. :beers:

Lunik commented 5 years ago

Getting the same issue with multi nodes k3s. With only one node, everything work like a charm. When adding a new node :

I'm playing with this command kubectl run -it --rm --restart=Never busybox --image=busybox sh When codeDNS and busybox pods are not on the same host, they can't talk. But when they are on the same node, they can...

My config

Two fresh Centos 7 launched on GCP with no firewall filtering between them. k3s cluster launched with those start commands : server : /usr/local/bin/k3s server --docker node : /usr/local/bin/k3s agent --docker --server https://server:6443 --token "TOKEN"

deniseschannon commented 5 years ago

This issue topic is very broad and each person's setup is different and unique. I'd like to close the original issue and if you are still having networking issues, can you open a new issue.

Ideally the subject is something that indicates what OS you are using, what version, and something specific about how the networking is broken.

Thanks for understanding!

Id2ndR commented 5 years ago

Fedora 29 these fixed both CoreDNS and Traefik install for me:

firewall-cmd --permanent --direct --add-rule ipv4 filter INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT
firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 1 -s 10.42.0.0/15 -j ACCEPT
firewall-cmd --reload

Might be possible to further narrow down or optimise the /15.

The narrow solution whould be sudo iptables -A KUBE-FORWARD -s 10.42.0.0/16 -d 10.42.0.0/16 -m state --state NEW -j ACCEPT

However, KUBE-FORWARD table is updated quiclky, so previous command will work one time if you are quick enough. So you can use sudo firewall-cmd --direct --add-rule ipv4 filter FORWARD 1 -s 10.42.0.0/16 -d 10.42.0.0/16 -m state --state NEW -j ACCEPT

Harguer commented 5 years ago

I had same error, and i spent some time to resolve it, even i reinstalled my OS, and tried with different kubes' versions. At the end the issue was firewalld. I disabled and tried again with a fresh installation, and now it is working fine.

liyimeng commented 5 years ago

@deniseschannon it seems the same issue on k3os.

adi90x commented 5 years ago

I've not used firewalld before, but essentially you need to add a rule equivalent to this iptables rule:

iptables -I INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT

This rule says, permit incoming packets from interface (bridge) cni0, with source in range 10.42.0.0/16. If you wanted to be more granular, you could individually open each port, rather than accept any.

Is it correct for ufw or am I the other way around :

sudo ufw allow in on cni0 from 10.42.0.0/16 comment "K3s rule : https://github.com/rancher/k3s/issues/24#issuecomment-469759329"

matthewygf commented 5 years ago

Getting the same issue with multi nodes k3s. With only one node, everything work like a charm. When adding a new node :

  • it show up with kubectl get nodes
  • when running new pods, they start properly on this node

I'm playing with this command kubectl run -it --rm --restart=Never busybox --image=busybox sh When codeDNS and busybox pods are not on the same host, they can't talk. But when they are on the same node, they can...

My config

Two fresh Centos 7 launched on GCP with no firewall filtering between them. k3s cluster launched with those start commands : server : /usr/local/bin/k3s server --docker node : /usr/local/bin/k3s agent --docker --server https://server:6443 --token "TOKEN"

@Lunik did you end up finding a solution for this ?? i am having the same problem.

devopswise commented 4 years ago

In case if you are having a similar issue, I notice there were rules related to docker in my chains. (I was using containerd). The steps I have followed:

  1. Stop cluster. (systemctl stop k3s on master, systemctl stop k3s-agent on agents)
  2. Delete all iptables rules in your chains like here: https://serverfault.com/a/200658/455081
  3. Start cluster again.
kbrowder commented 4 years ago

On fedora 31 I found the simplest thing to do was:

firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 0 -i cni0 -j ACCEPT
firewall-cmd --reload

(edit: fix random space in -i)

jbutler992 commented 4 years ago

@kbrowder would you mind reviewing that command? Im on Fedora31 and seeing this issue but when I try your command it says its not a valid ipv4 filter command.

maci0 commented 4 years ago

@kbrowder would you mind reviewing that command? Im on Fedora31 and seeing this issue but when I try your command it says its not a valid ipv4 filter command.

there was just a typo in his command it says - i cni0 instead of -i cni0

besides that it works for me on centos8

kbrowder commented 4 years ago

@maci0, woops, you're right, I edited my response above, sorry for the delay @jbutler992

adacaccia commented 4 years ago

I would just like to summarize this post by clearly stating the two iptables rules, taken from above, whose fixed my broken fresh install of k3s in a matter of (a fraction of) a second, after several days of struggling with it:

sudo iptables -I INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT sudo iptables -I FORWARD 1 -s 10.42.0.0/15 -j ACCEPT

Many many thanks to you all for your contribution!

robodude666 commented 4 years ago

This is an old thread, but I still want to share this to potentially save someone days of frustration.

I had a private network of 10.42.42.0/24 and could not figure out why k3s was not working. Using non-default cluster/service CIDRs fixed networking issues for me.

elsbrock commented 4 years ago

This is an old thread, but I still want to share this to potentially save someone days of frustration.

I had a private network of 10.42.42.0/24 and could not figure out why k3s was not working. Using non-default cluster/service CIDRs fixed networking issues for me.

Thank you, kind sir. That hint saved me a ton of time!

cakiem8x commented 3 years ago

I would just like to summarize this post by clearly stating the two iptables rules, taken from above, whose fixed my broken fresh install of k3s in a matter of (a fraction of) a second, after several days of struggling with it:

sudo iptables -I INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT sudo iptables -I FORWARD 1 -s 10.42.0.0/15 -j ACCEPT

Many many thanks to you all for your contribution!

Thanks you !