Closed BlackTurtle123 closed 2 years ago
This was happening to me while trying to "auto-install" a Helm Chart via
apiVersion: helm.cattle.io/v1
kind: HelmChart
The helm-install pod was scheduled to a Raspberry Pi 4 node. Deleted the Pod which caused it to be scheduled on an x86_64 node and it ran fine (running a mixed CPU Arch.). Running iptables-save
on the Raspberry Pi yielded no rules pertaining to Kubernetes. Not sure why yet...
If you run iptables-save, it tells you that you need to run iptables-legacy-save to see the rest of the rules output. That's where I am seeing all of the kubernetes rules listed.
After being plagued with the issue for several days, I decided to try a removal of iptables, and all of a sudden my services are working and I am able to get DNS resolution on everything. The issue related above, 977, talks about some iptables conflicts, and placement of the REJECT rule. I haven't dug into a correct order on Raspbian yet, but this was a quick fix for me to get everything up and running.
Hi There, is there any news here? I am experiencing this issue and I've been trying to solve it for several weeks on and off with no success. I see things like
1 reflector.go:322] github.com/containous/traefik/vendor/k8s.io/client-go/informers/factory.go:86: Failed to watch *v1.Service: Get https://172.18.0.1:443/api/v1/services?resourceVersion=2205&timeoutSeconds=460&watch=true: dial tcp 172.18.0.1:443: connect: connection refused
in the Traefik log or
Failed to watch *v1.Namespace: Get "https://172.18.0.1:443/api/v1/namespaces?allowWatchBookmarks=true&resourceVersion=2205&timeout=8m20s&timeoutSeconds=500&watch=true": dial tcp 172.18.0.1:443: connect: connection refused
in the coredns log. I am using k3s v1.18.8+k3s1 on CentOS 8.
iptables --version
iptables v1.8.2 (nf_tables)
I have tried modprobe br_netfilter
and also tried to add nftables rules:
nft add rule filter INPUT ip saddr 172.17.0.0/24 iif cni0 accept
nft add rule filter OUTPUT ip saddr 172.18.0.0/24 accept
I run k3s server --pause-image k8s.gcr.io/pause:3.1 --cluster-cidr=172.17.0.0/24 --service-cidr=172.18.0.0/24
.
Any idea how to solve it? Is k3s supposed to work on CentOS 8 with nftables after all, or only with firewalld/iptables?
I got the same issue with Ubuntu 20! Details could be found here: https://serverfault.com/questions/1044971/k3s-dial-tcp-10-43-0-1443-connect-connection-refused
Error
E1204 11:42:25.216392 8 leaderelection.go:321] error retrieving resource lock ingress-nginx/ingress-controller-leader-nginx: Get "https://10.43.0.1:443/api/v1/namespaces/ingress-nginx/configmaps/ingress-controller-leader-nginx": dial tcp 10.43.0.1:443: connect: connection refused
All 10.43.x.x IPs seem not be be working!
Any Ideas / Solution onto this?
EDIT: Seems Like K3s does not work when colocating master and node on the same hosts... at least it was the problem for me
You can try to run sudo iptables -I FORWARD -j ACCEPT
and see it it (temporary) solves the issue.
K3S (I'm not sure of what component exactly) regenerates iptables rules each 30 seconds, and adds KUBE-POD-FW-*
chain rules above the manually insert one, so the issue will still be there.
Currently I have the issue with traefik for some ingresses but not all. I'm still digging to find out how the rules are generated and diagnose it. EDIT: in my case, it was Networkpolicies that blocks my egress traefik. I found it by narrowing the problem through iptables chain rules.
The kubelet is responsible for most of the forwarding rules. The remainder are handled by the network policy controller, although their tables will likely be empty if you don't have any policies in your cluster to restrict communication.
Do you perhaps have your host-based firewall (ufw, firewalld, etc) enabled?
In my case, there were no problem at all: all worked as expected, but I just did not known it. To be more specific a Network Policy was added by the helm chart I used, but I did not noticed it.
Ah yeah. Kubernetes network policy would definitely block it, by design.
I found the problem on my arch arm and regular arch is related to this https://github.com/k3s-io/k3s/issues/1812
I doesn't look like the iptables-detect.sh
properly supports arch. When I run it on one of my nodes:
[k8s@k8s-master-01 ~]$ sudo find / -type f -name iptables-detect.sh
/var/lib/rancher/k3s/data/912de41a65c99bc4d50bbb78e6106f3acbf3a70b8dead77b4c4ebc6755b4f9d6/bin/aux/iptables-detect.sh
[k8s@k8s-master-01 ~]$ sudo /var/lib/rancher/k3s/data/912de41a65c99bc4d50bbb78e6106f3acbf3a70b8dead77b4c4ebc6755b4f9d6/bin/aux/iptables-detect.sh
mode is legacy detected via rules and containerized is false
But it's nft, not legacy:
[k8s@k8s-master-01 ~]$ ls -l /sbin/iptables
lrwxrwxrwx 1 root root 12 Mar 10 15:12 /sbin/iptables -> iptables-nft
quick workaround is to change the links and then restart your cluster:
sudo -s
cd /bin
rm iptables && ln -s iptables-legacy iptables && rm ip6tables && ln -s ip6tables-legacy ip6tables
Can test if it's working by doing this
[k8s@k8s-master-01 ~]$ kubectl apply -f https://k8s.io/examples/admin/dns/dnsutils.yaml
[k8s@k8s-master-01 ~]$ kubectl exec -it dnsutils -n default -- nslookup google.com
Server: 10.43.0.10
Address: 10.43.0.10#53
Non-authoritative answer:
Name: google.com
Address: 172.217.8.174
Name: google.com
Address: 2607:f8b0:4000:803::200e
If it's broken you'll get ;; connection timed out; no servers could be reached
[k8s@k8s-master-01 ~]$ sudo /var/lib/rancher/k3s/data/912de41a65c99bc4d50bbb78e6106f3acbf3a70b8dead77b4c4ebc6755b4f9d6/bin/aux/iptables-detect.sh
mode is legacy detected via rules and containerized is false
detected via rules
indicates that you have legacy iptables rules present on the system. This is determined by running iptables-legacy-save
and ip6tables-legacy-save
- if these return more than 10 lines of output between the two of them then it is assumed that you are using legacy iptables. Can you determine what it is that was creating these legacy rules?
I just cleared out the cluster and did k3s-uninstall.sh on all nodes. Then I did the following and rebooted to make sure there were no legacy rules.
iptables-save | awk '/^[*]/ { print $1 }
/^:[A-Z]+ [^-]/ { print $1 " ACCEPT" ; }
/COMMIT/ { print $0; }' | iptables-restore
ip6tables-save | awk '/^[*]/ { print $1 }
/^:[A-Z]+ [^-]/ { print $1 " ACCEPT" ; }
/COMMIT/ { print $0; }' | ip6tables-restore
rmmod iptable_filter iptable_mangle iptable_nat iptable_raw iptable_security
rmmod ip6table_filter ip6table_mangle ip6table_nat ip6table_raw ip6table_security
I also ensured the original symlinks were restored:
[k8s@k8s-master-01 ~]$ ls -l /sbin/iptables
lrwxrwxrwx 1 root root 12 Mar 10 15:12 /sbin/iptables -> iptables-nft
Then I brought the cluster back up. I'm using this ansible role and example except with one worker node, servicelb disabled, and traefik disabled. https://github.com/PyratLabs/ansible-role-k3s/blob/main/documentation/quickstart-ha-cluster.md
Once it's back up I'm still getting mode is legacy detected via rules and containerized is false
and iptables-legacy-save shows lots of rules and iptables-nft-save shows the warning # Warning: iptables-legacy tables present, use iptables-legacy-save to see them
but all rules were added by k3s.
I tested as detailed before and the google.com can't be resolved by dnsutils pod
Then I went to each node and changed the symlink for iptables and ip6tables to point to iptables-legacy and ip6tables-legacy, ran k3s-uninstall.sh on each node of the cluster, and rebuilt the cluster again with ansible, and then tested again. Now it resolves properly.
The rules check is here, can you compare the output on your systems? https://github.com/k3s-io/k3s-root/blob/e2afbdfc30e9bc2f020b307504cc5d1a31b35404/iptables-detect/iptables-detect.sh#L73
(iptables-legacy-save || true; ip6tables-legacy-save || true) 2>/dev/null | grep '^-'
I reset the cluster, updated all nodes, cleared all iptables rules, and re-installed iptables. I downloaded the iptables-detect.sh
and ran it before installing k3s. Here's what I get for arch armv7l and arch amd64.
[k8s@k8s-master-01 ~]$ sudo ./iptables-detect.sh
mode is nft detected via unknownos and containerized is false
I'm curious what was initially there that lead it to detect legacy iptables, though.
Okay, I think this might be this issue then? After re-installing iptables it changes the links but the detect script is saying nft.
[root@k8s-master-01 k8s]# ls -l /sbin/iptables
lrwxrwxrwx 1 root root 20 Jan 21 22:56 /sbin/iptables -> xtables-legacy-multi
[root@k8s-master-01 k8s]# ./iptables-detect.sh
mode is nft detected via unknownos and containerized is false
To test this I just spun up the cluster again and I can't resolve anything from a pod. This is creating legacy rules though.
[root@k8s-master-01 k8s]# /sbin/iptables-nft-save
# Warning: iptables-legacy tables present, use iptables-legacy-save to see them
I found another package for arch iptables-nft
which conflicts with iptables
package so I installed it.
After install I have this.
[k8s@k8s-master-01 ~]$ sudo ls -l /sbin/iptables
lrwxrwxrwx 1 root root 17 Jan 21 22:56 /sbin/iptables -> xtables-nft-multi
[k8s@k8s-master-01 ~]$ sudo ./iptables-detect.sh
mode is nft detected via unknownos and containerized is false
Spinning up the cluster now to see if this resolves things.
Now after spinning up the cluster I get this
[k8s@k8s-master-01 ~]$ sudo ./iptables-detect.sh
mode is nft detected via rules and containerized is false
[k8s@k8s-master-01 ~]$ sudo ls -l /sbin/iptables
lrwxrwxrwx 1 root root 17 Jan 21 22:56 /sbin/iptables -> xtables-nft-multi
Which is good I think. But something else going on now. If I open a shell into a busybox container in the default namespace I get a nslookup timeout I can't ping the dns server that nslookup is using, which is the ip of kube-dns service.
Most distros have an update-alternatives script that you are supposed to use to do this sort of thing, as opposed to symlinking things manually. You might check to see if Arch has a similar tool that you're intended to use.
I was only changing symlinks to test. When I removed iptables and installed iptables-nft it removed all the old executables and symlinks so everything is as intended on all nodes now. I'm going through these steps now so hopefully that will shed some light on the problem. https://rancher.com/docs/rancher/v2.x/en/troubleshooting/dns/
I've re-imaged a few times now and tried both iptables and iptables-nft packages. I'm pretty certain at this point that it's something wrong with the iptables rules that k3s is adding because I can get outside the pod and resolve with dns if I set the server in nslookup to 1.1.1.1 or 8.8.8.8. I just can't communicate with the coredns service on 10.43.0.10.
I don't know how it started working before but it's certainly not working now no matter what I try.
So in your current state, should your system be using iptables or nftables? Which one is k3s adding rules to?
It doesn't work in either scenario. Nft rules or legacy.
I was able to get this working which uses standard k8s and iptables. https://github.com/raspbernetes/k8s-cluster-installation
I have exactly the same issue after a fresh installation of k3s on a fresh CentOS 8 VM (Virtualbox). Is k3s even supposed to work with CentOS 8?
On my CentOS 8 machine, the package iptables-services
that installes the systemd service iptables
was the issue. After uninstalling this package, everything works fine. See here: https://github.com/k3s-io/k3s/issues/1817#issuecomment-820677413
Same issue on NixOS, iptables v1.8.7 (legacy)
Same issue, on RHEL 8.4 (aws ami) without iptables, k3s v1.20.2+k3s1
Detect script returns mode is nft detected via os and containerized is false
On Centos 8 the result is the same, however the problem does not occur.
Tried some of the suggestions: modprobes, ip forward. Installing iptables-service did not help. No success yet.
Have you seen https://rancher.com/docs/k3s/latest/en/advanced/#additional-preparation-for-red-hat-centos-enterprise-linux
oh my. I thought this didn't apply because I have no firewall. But the
systemctl disable nm-cloud-setup.service nm-cloud-setup.timer
fixed this problem for me. Thank you!
shouldn't disable-cloud-controller set to true disable both the below services if they are enabled ?
systemctl disable nm-cloud-setup.service nm-cloud-setup.timer
Network Manager's interference with container virtual interfaces is a separate issue from firewalld/ufw blocking traffic... so you need to ensure both are disabled.
@brandond if we stop these services before the RKE2 install, would it still require a reboot ? If it doesn't then as part of our infra automation we will stop these if they exists and are active before RKE2 install
systemctl disable nm-cloud-setup.service nm-cloud-setup.timer reboot
IMO this issue turns into a FAQ related to network configuration. Comments either are or should be written in the documentation. So, can we close this issue now?
It is covered in the documentation, I linked it up above.
This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.
Version: v1.0.1
Describe the bug The internal connection on range 10.43.x.x doesn't seem to work. Old iptables has been anabled, system is debian based.
To Reproduce Use the playbook in contributions in the repo, update the version, and only install the service (master) not node
Expected behavior K3S cluster installs and starts up
Actual behavior Pods fail and timeout on 10.43.x.x ip's
Additional context Error: Get https://10.43.0.1:443/api/v1/namespaces/kube-system/secrets?labelSelector=OWNER%!D(MISSING)TILLER: dial tcp 10.43.0.1:443: i/o timeout
panic: Get https://10.43.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication: dial tcp 10.43.0.1:443: i/o timeout