k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
26.83k stars 2.26k forks source link

Can't access anything on the 10.43.x.x range #1247

Closed BlackTurtle123 closed 2 years ago

BlackTurtle123 commented 4 years ago

Version: v1.0.1

Describe the bug The internal connection on range 10.43.x.x doesn't seem to work. Old iptables has been anabled, system is debian based.

To Reproduce Use the playbook in contributions in the repo, update the version, and only install the service (master) not node

Expected behavior K3S cluster installs and starts up

Actual behavior Pods fail and timeout on 10.43.x.x ip's

Additional context Error: Get https://10.43.0.1:443/api/v1/namespaces/kube-system/secrets?labelSelector=OWNER%!D(MISSING)TILLER: dial tcp 10.43.0.1:443: i/o timeout

panic: Get https://10.43.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication: dial tcp 10.43.0.1:443: i/o timeout

maxirus commented 4 years ago

This was happening to me while trying to "auto-install" a Helm Chart via

apiVersion: helm.cattle.io/v1
kind: HelmChart

The helm-install pod was scheduled to a Raspberry Pi 4 node. Deleted the Pod which caused it to be scheduled on an x86_64 node and it ran fine (running a mixed CPU Arch.). Running iptables-save on the Raspberry Pi yielded no rules pertaining to Kubernetes. Not sure why yet...

pennywise53 commented 4 years ago

If you run iptables-save, it tells you that you need to run iptables-legacy-save to see the rest of the rules output. That's where I am seeing all of the kubernetes rules listed.

pennywise53 commented 4 years ago

After being plagued with the issue for several days, I decided to try a removal of iptables, and all of a sudden my services are working and I am able to get DNS resolution on everything. The issue related above, 977, talks about some iptables conflicts, and placement of the REJECT rule. I haven't dug into a correct order on Raspbian yet, but this was a quick fix for me to get everything up and running.

ghost commented 3 years ago

Hi There, is there any news here? I am experiencing this issue and I've been trying to solve it for several weeks on and off with no success. I see things like

 1 reflector.go:322] github.com/containous/traefik/vendor/k8s.io/client-go/informers/factory.go:86: Failed to watch *v1.Service: Get https://172.18.0.1:443/api/v1/services?resourceVersion=2205&timeoutSeconds=460&watch=true: dial tcp 172.18.0.1:443: connect: connection refused

in the Traefik log or

Failed to watch *v1.Namespace: Get "https://172.18.0.1:443/api/v1/namespaces?allowWatchBookmarks=true&resourceVersion=2205&timeout=8m20s&timeoutSeconds=500&watch=true": dial tcp 172.18.0.1:443: connect: connection refused

in the coredns log. I am using k3s v1.18.8+k3s1 on CentOS 8.

 iptables --version
iptables v1.8.2 (nf_tables)

I have tried modprobe br_netfilter and also tried to add nftables rules:

nft add rule filter INPUT ip saddr 172.17.0.0/24 iif cni0 accept
nft add rule filter OUTPUT ip saddr 172.18.0.0/24 accept

I run k3s server --pause-image k8s.gcr.io/pause:3.1 --cluster-cidr=172.17.0.0/24 --service-cidr=172.18.0.0/24. Any idea how to solve it? Is k3s supposed to work on CentOS 8 with nftables after all, or only with firewalld/iptables?

Berndinox commented 3 years ago

I got the same issue with Ubuntu 20! Details could be found here: https://serverfault.com/questions/1044971/k3s-dial-tcp-10-43-0-1443-connect-connection-refused

Error E1204 11:42:25.216392 8 leaderelection.go:321] error retrieving resource lock ingress-nginx/ingress-controller-leader-nginx: Get "https://10.43.0.1:443/api/v1/namespaces/ingress-nginx/configmaps/ingress-controller-leader-nginx": dial tcp 10.43.0.1:443: connect: connection refused

All 10.43.x.x IPs seem not be be working!

Any Ideas / Solution onto this?

EDIT: Seems Like K3s does not work when colocating master and node on the same hosts... at least it was the problem for me

Id2ndR commented 3 years ago

You can try to run sudo iptables -I FORWARD -j ACCEPT and see it it (temporary) solves the issue.

K3S (I'm not sure of what component exactly) regenerates iptables rules each 30 seconds, and adds KUBE-POD-FW-* chain rules above the manually insert one, so the issue will still be there.

Currently I have the issue with traefik for some ingresses but not all. I'm still digging to find out how the rules are generated and diagnose it. EDIT: in my case, it was Networkpolicies that blocks my egress traefik. I found it by narrowing the problem through iptables chain rules.

brandond commented 3 years ago

The kubelet is responsible for most of the forwarding rules. The remainder are handled by the network policy controller, although their tables will likely be empty if you don't have any policies in your cluster to restrict communication.

Do you perhaps have your host-based firewall (ufw, firewalld, etc) enabled?

Id2ndR commented 3 years ago

In my case, there were no problem at all: all worked as expected, but I just did not known it. To be more specific a Network Policy was added by the helm chart I used, but I did not noticed it.

brandond commented 3 years ago

Ah yeah. Kubernetes network policy would definitely block it, by design.

brunnels commented 3 years ago

I found the problem on my arch arm and regular arch is related to this https://github.com/k3s-io/k3s/issues/1812

I doesn't look like the iptables-detect.sh properly supports arch. When I run it on one of my nodes:

[k8s@k8s-master-01 ~]$ sudo find / -type f -name iptables-detect.sh
/var/lib/rancher/k3s/data/912de41a65c99bc4d50bbb78e6106f3acbf3a70b8dead77b4c4ebc6755b4f9d6/bin/aux/iptables-detect.sh

[k8s@k8s-master-01 ~]$ sudo /var/lib/rancher/k3s/data/912de41a65c99bc4d50bbb78e6106f3acbf3a70b8dead77b4c4ebc6755b4f9d6/bin/aux/iptables-detect.sh
mode is legacy detected via rules and containerized is false

But it's nft, not legacy:

[k8s@k8s-master-01 ~]$ ls -l /sbin/iptables
lrwxrwxrwx 1 root root 12 Mar 10 15:12 /sbin/iptables -> iptables-nft

quick workaround is to change the links and then restart your cluster:

sudo -s
cd /bin
rm iptables && ln -s iptables-legacy iptables && rm ip6tables && ln -s ip6tables-legacy ip6tables

Can test if it's working by doing this

[k8s@k8s-master-01 ~]$ kubectl apply -f https://k8s.io/examples/admin/dns/dnsutils.yaml
[k8s@k8s-master-01 ~]$ kubectl exec -it dnsutils -n default -- nslookup google.com
Server:     10.43.0.10
Address:    10.43.0.10#53

Non-authoritative answer:
Name:   google.com
Address: 172.217.8.174
Name:   google.com
Address: 2607:f8b0:4000:803::200e

If it's broken you'll get ;; connection timed out; no servers could be reached

brandond commented 3 years ago
[k8s@k8s-master-01 ~]$ sudo /var/lib/rancher/k3s/data/912de41a65c99bc4d50bbb78e6106f3acbf3a70b8dead77b4c4ebc6755b4f9d6/bin/aux/iptables-detect.sh
mode is legacy detected via rules and containerized is false

detected via rules indicates that you have legacy iptables rules present on the system. This is determined by running iptables-legacy-save and ip6tables-legacy-save - if these return more than 10 lines of output between the two of them then it is assumed that you are using legacy iptables. Can you determine what it is that was creating these legacy rules?

brunnels commented 3 years ago

I just cleared out the cluster and did k3s-uninstall.sh on all nodes. Then I did the following and rebooted to make sure there were no legacy rules.

iptables-save | awk '/^[*]/ { print $1 } 
/^:[A-Z]+ [^-]/ { print $1 " ACCEPT" ; }
/COMMIT/ { print $0; }' | iptables-restore

ip6tables-save | awk '/^[*]/ { print $1 } 
/^:[A-Z]+ [^-]/ { print $1 " ACCEPT" ; }
/COMMIT/ { print $0; }' | ip6tables-restore

rmmod iptable_filter iptable_mangle iptable_nat iptable_raw iptable_security
rmmod ip6table_filter ip6table_mangle ip6table_nat ip6table_raw ip6table_security

I also ensured the original symlinks were restored:

[k8s@k8s-master-01 ~]$ ls -l /sbin/iptables
lrwxrwxrwx 1 root root 12 Mar 10 15:12 /sbin/iptables -> iptables-nft

Then I brought the cluster back up. I'm using this ansible role and example except with one worker node, servicelb disabled, and traefik disabled. https://github.com/PyratLabs/ansible-role-k3s/blob/main/documentation/quickstart-ha-cluster.md

Once it's back up I'm still getting mode is legacy detected via rules and containerized is false and iptables-legacy-save shows lots of rules and iptables-nft-save shows the warning # Warning: iptables-legacy tables present, use iptables-legacy-save to see them but all rules were added by k3s.

brunnels commented 3 years ago

I tested as detailed before and the google.com can't be resolved by dnsutils pod

Then I went to each node and changed the symlink for iptables and ip6tables to point to iptables-legacy and ip6tables-legacy, ran k3s-uninstall.sh on each node of the cluster, and rebuilt the cluster again with ansible, and then tested again. Now it resolves properly.

brandond commented 3 years ago

The rules check is here, can you compare the output on your systems? https://github.com/k3s-io/k3s-root/blob/e2afbdfc30e9bc2f020b307504cc5d1a31b35404/iptables-detect/iptables-detect.sh#L73

(iptables-legacy-save || true; ip6tables-legacy-save || true) 2>/dev/null | grep '^-'
brunnels commented 3 years ago

I reset the cluster, updated all nodes, cleared all iptables rules, and re-installed iptables. I downloaded the iptables-detect.sh and ran it before installing k3s. Here's what I get for arch armv7l and arch amd64.

[k8s@k8s-master-01 ~]$ sudo ./iptables-detect.sh 
mode is nft detected via unknownos and containerized is false
brandond commented 3 years ago

I'm curious what was initially there that lead it to detect legacy iptables, though.

brunnels commented 3 years ago

Okay, I think this might be this issue then? After re-installing iptables it changes the links but the detect script is saying nft.

[root@k8s-master-01 k8s]# ls -l /sbin/iptables
lrwxrwxrwx 1 root root 20 Jan 21 22:56 /sbin/iptables -> xtables-legacy-multi
[root@k8s-master-01 k8s]# ./iptables-detect.sh 
mode is nft detected via unknownos and containerized is false

To test this I just spun up the cluster again and I can't resolve anything from a pod. This is creating legacy rules though.

[root@k8s-master-01 k8s]# /sbin/iptables-nft-save 
# Warning: iptables-legacy tables present, use iptables-legacy-save to see them
brunnels commented 3 years ago

I found another package for arch iptables-nft which conflicts with iptables package so I installed it.

After install I have this.

[k8s@k8s-master-01 ~]$ sudo ls -l /sbin/iptables
lrwxrwxrwx 1 root root 17 Jan 21 22:56 /sbin/iptables -> xtables-nft-multi
[k8s@k8s-master-01 ~]$ sudo ./iptables-detect.sh 
mode is nft detected via unknownos and containerized is false

Spinning up the cluster now to see if this resolves things.

brunnels commented 3 years ago

Now after spinning up the cluster I get this

[k8s@k8s-master-01 ~]$ sudo ./iptables-detect.sh 
mode is nft detected via rules and containerized is false
[k8s@k8s-master-01 ~]$ sudo ls -l /sbin/iptables
lrwxrwxrwx 1 root root 17 Jan 21 22:56 /sbin/iptables -> xtables-nft-multi

Which is good I think. But something else going on now. If I open a shell into a busybox container in the default namespace I get a nslookup timeout I can't ping the dns server that nslookup is using, which is the ip of kube-dns service.

brandond commented 3 years ago

Most distros have an update-alternatives script that you are supposed to use to do this sort of thing, as opposed to symlinking things manually. You might check to see if Arch has a similar tool that you're intended to use.

brunnels commented 3 years ago

I was only changing symlinks to test. When I removed iptables and installed iptables-nft it removed all the old executables and symlinks so everything is as intended on all nodes now. I'm going through these steps now so hopefully that will shed some light on the problem. https://rancher.com/docs/rancher/v2.x/en/troubleshooting/dns/

brunnels commented 3 years ago

I've re-imaged a few times now and tried both iptables and iptables-nft packages. I'm pretty certain at this point that it's something wrong with the iptables rules that k3s is adding because I can get outside the pod and resolve with dns if I set the server in nslookup to 1.1.1.1 or 8.8.8.8. I just can't communicate with the coredns service on 10.43.0.10.

I don't know how it started working before but it's certainly not working now no matter what I try.

brandond commented 3 years ago

So in your current state, should your system be using iptables or nftables? Which one is k3s adding rules to?

brunnels commented 3 years ago

It doesn't work in either scenario. Nft rules or legacy.

brunnels commented 3 years ago

I was able to get this working which uses standard k8s and iptables. https://github.com/raspbernetes/k8s-cluster-installation

ChristianCiach commented 3 years ago

I have exactly the same issue after a fresh installation of k3s on a fresh CentOS 8 VM (Virtualbox). Is k3s even supposed to work with CentOS 8?

ChristianCiach commented 3 years ago

On my CentOS 8 machine, the package iptables-services that installes the systemd service iptables was the issue. After uninstalling this package, everything works fine. See here: https://github.com/k3s-io/k3s/issues/1817#issuecomment-820677413

corpix commented 2 years ago

Same issue on NixOS, iptables v1.8.7 (legacy)

Timvissers commented 2 years ago

Same issue, on RHEL 8.4 (aws ami) without iptables, k3s v1.20.2+k3s1 Detect script returns mode is nft detected via os and containerized is false On Centos 8 the result is the same, however the problem does not occur. Tried some of the suggestions: modprobes, ip forward. Installing iptables-service did not help. No success yet.

brandond commented 2 years ago

Have you seen https://rancher.com/docs/k3s/latest/en/advanced/#additional-preparation-for-red-hat-centos-enterprise-linux

Timvissers commented 2 years ago

Have you seen https://rancher.com/docs/k3s/latest/en/advanced/#additional-preparation-for-red-hat-centos-enterprise-linux

oh my. I thought this didn't apply because I have no firewall. But the systemctl disable nm-cloud-setup.service nm-cloud-setup.timer fixed this problem for me. Thank you!

rajivml commented 2 years ago

shouldn't disable-cloud-controller set to true disable both the below services if they are enabled ?

systemctl disable nm-cloud-setup.service nm-cloud-setup.timer

brandond commented 2 years ago

Network Manager's interference with container virtual interfaces is a separate issue from firewalld/ufw blocking traffic... so you need to ensure both are disabled.

rajivml commented 2 years ago

@brandond if we stop these services before the RKE2 install, would it still require a reboot ? If it doesn't then as part of our infra automation we will stop these if they exists and are active before RKE2 install

systemctl disable nm-cloud-setup.service nm-cloud-setup.timer reboot

Id2ndR commented 2 years ago

IMO this issue turns into a FAQ related to network configuration. Comments either are or should be written in the documentation. So, can we close this issue now?

brandond commented 2 years ago

It is covered in the documentation, I linked it up above.

stale[bot] commented 2 years ago

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.