Closed brandond closed 8 months ago
Some additional information from the original discussion:
--prefer-bundled-bin
.iptables -F
) or by rebooting the node while k3s is in crash loop after upgrade.Speculation: I think the issue happens because install.sh
uses iptables-save
from the host that only outputs a subset of the rules that the bundled binary would return.
I plan to create another discussion for this tomorrow, because I would like to know why the install script even attempts to remove iptables rules (from a running k3s instance no less!). The documentation also documents another way to upgrade k3s by not using the install script and only replacing the k3s binary directly. I guess (but didn't test yet!) that not using the install script to upgrade should also prevent this issue.
It appears that packets to the loopback address are being incorrectly masqueraded, which in turn causes them to be blocked by the KUBE-FIREWALL rule that prevents access to host loopback addresses from non-loopback sources.
Trace of a connection to 127.0.0.1:2399 from 1.27.6+k3s1:
trace id bd0052fb ip raw OUTPUT packet: oif "lo" ip saddr 127.0.0.1 ip daddr 127.0.0.1 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 46391 ip length 60 tcp sport 41168 tcp dport 2399 tcp flags == syn tcp window 43690
trace id bd0052fb ip raw OUTPUT rule meta l4proto tcp ip daddr 127.0.0.1 tcp dport 2399 counter packets 1 bytes 60 meta nftrace set 1 (verdict continue)
trace id bd0052fb ip raw OUTPUT verdict continue
trace id bd0052fb ip raw OUTPUT policy accept
trace id bd0052fb ip nat OUTPUT packet: oif "lo" ip saddr 127.0.0.1 ip daddr 127.0.0.1 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 46391 ip length 60 tcp sport 41168 tcp dport 2399 tcp flags == syn tcp window 43690
trace id bd0052fb ip nat OUTPUT rule counter packets 910 bytes 55873 jump KUBE-SERVICES (verdict jump KUBE-SERVICES)
trace id bd0052fb ip nat KUBE-SERVICES rule fib daddr type local counter packets 132 bytes 6809 jump KUBE-NODEPORTS (verdict jump KUBE-NODEPORTS)
trace id bd0052fb ip nat KUBE-NODEPORTS verdict continue
trace id bd0052fb ip nat KUBE-SERVICES verdict continue
trace id bd0052fb ip nat OUTPUT rule fib daddr type local counter packets 63 bytes 3780 jump CNI-HOSTPORT-DNAT (verdict jump CNI-HOSTPORT-DNAT)
trace id bd0052fb ip nat CNI-HOSTPORT-DNAT verdict continue
trace id bd0052fb ip nat OUTPUT verdict continue
trace id bd0052fb ip nat OUTPUT policy accept
trace id bd0052fb ip filter OUTPUT packet: oif "lo" ip saddr 127.0.0.1 ip daddr 127.0.0.1 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 46391 ip length 60 tcp sport 41168 tcp dport 2399 tcp flags == syn tcp window 43690
trace id bd0052fb ip filter OUTPUT rule counter packets 31934 bytes 7098788 jump KUBE-ROUTER-OUTPUT (verdict jump KUBE-ROUTER-OUTPUT)
trace id bd0052fb ip filter KUBE-ROUTER-OUTPUT verdict continue
trace id bd0052fb ip filter OUTPUT rule ct state new counter packets 30 bytes 1992 jump KUBE-PROXY-FIREWALL (verdict jump KUBE-PROXY-FIREWALL)
trace id bd0052fb ip filter KUBE-PROXY-FIREWALL verdict continue
trace id bd0052fb ip filter OUTPUT rule ct state new counter packets 30 bytes 1992 jump KUBE-SERVICES (verdict jump KUBE-SERVICES)
trace id bd0052fb ip filter KUBE-SERVICES verdict continue
trace id bd0052fb ip filter OUTPUT rule counter packets 29469 bytes 6855653 jump KUBE-FIREWALL (verdict jump KUBE-FIREWALL)
trace id bd0052fb ip filter KUBE-FIREWALL verdict continue
trace id bd0052fb ip filter OUTPUT verdict continue
trace id bd0052fb ip filter OUTPUT policy accept
trace id bd0052fb ip nat POSTROUTING packet: oif "lo" ip saddr 127.0.0.1 ip daddr 127.0.0.1 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 46391 ip length 60 tcp sport 41168 tcp dport 2399 tcp flags == syn tcp window 43690
trace id bd0052fb ip nat POSTROUTING rule counter packets 787 bytes 47776 jump CNI-HOSTPORT-MASQ (verdict jump CNI-HOSTPORT-MASQ)
trace id bd0052fb ip nat CNI-HOSTPORT-MASQ verdict continue
trace id bd0052fb ip nat POSTROUTING rule counter packets 914 bytes 56102 jump KUBE-POSTROUTING (verdict jump KUBE-POSTROUTING)
trace id bd0052fb ip nat KUBE-POSTROUTING verdict return
trace id bd0052fb ip nat POSTROUTING rule counter packets 865 bytes 52899 jump FLANNEL-POSTRTG (verdict jump FLANNEL-POSTRTG)
trace id bd0052fb ip nat FLANNEL-POSTRTG verdict continue
trace id bd0052fb ip nat POSTROUTING verdict continue
trace id bd0052fb ip nat POSTROUTING policy accept
trace id 3b0c5f4b ip raw PREROUTING packet: iif "lo" @ll,0,112 2048 ip saddr 127.0.0.1 ip daddr 127.0.0.1 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 46391 ip length 60 tcp sport 41168 tcp dport 2399 tcp flags == syn tcp window 43690
trace id 3b0c5f4b ip raw PREROUTING rule meta l4proto tcp ip daddr 127.0.0.1 tcp dport 2399 counter packets 2 bytes 120 meta nftrace set 1 (verdict continue)
trace id 3b0c5f4b ip raw PREROUTING verdict continue
trace id 3b0c5f4b ip raw PREROUTING policy accept
trace id 3b0c5f4b ip filter INPUT packet: iif "lo" @ll,0,112 2048 ip saddr 127.0.0.1 ip daddr 127.0.0.1 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 46391 ip length 60 tcp sport 41168 tcp dport 2399 tcp flags == syn tcp window 43690
trace id 3b0c5f4b ip filter INPUT rule counter packets 31609 bytes 7328989 jump KUBE-ROUTER-INPUT (verdict jump KUBE-ROUTER-INPUT)
trace id 3b0c5f4b ip filter KUBE-ROUTER-INPUT verdict continue
trace id 3b0c5f4b ip filter INPUT rule ct state new counter packets 52 bytes 2528 jump KUBE-PROXY-FIREWALL (verdict jump KUBE-PROXY-FIREWALL)
trace id 3b0c5f4b ip filter KUBE-PROXY-FIREWALL verdict continue
trace id 3b0c5f4b ip filter INPUT rule counter packets 29508 bytes 6851425 jump KUBE-NODEPORTS (verdict jump KUBE-NODEPORTS)
trace id 3b0c5f4b ip filter KUBE-NODEPORTS verdict continue
trace id 3b0c5f4b ip filter INPUT rule ct state new counter packets 52 bytes 2528 jump KUBE-EXTERNAL-SERVICES (verdict jump KUBE-EXTERNAL-SERVICES)
trace id 3b0c5f4b ip filter KUBE-EXTERNAL-SERVICES verdict continue
trace id 3b0c5f4b ip filter INPUT rule counter packets 29508 bytes 6851425 jump KUBE-FIREWALL (verdict jump KUBE-FIREWALL)
trace id 3b0c5f4b ip filter KUBE-FIREWALL verdict continue
trace id 3b0c5f4b ip filter INPUT verdict continue
trace id 3b0c5f4b ip filter INPUT policy accept
Trace of a connection to 127.0.0.1:2399 from a node that was upgraded from 1.27.6+k3s1 to 1.27.7+k3s1:
trace id f941b4ee ip raw OUTPUT packet: oif "lo" ip saddr 127.0.0.1 ip daddr 127.0.0.1 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 53609 ip length 60 tcp sport 51048 tcp dport 2399 tcp flags == syn tcp window 43690
trace id f941b4ee ip raw OUTPUT rule meta l4proto tcp ip daddr 127.0.0.1 tcp dport { 2379,2399} counter packets 84 bytes 5040 meta nftrace set 1 (verdict continue)
trace id f941b4ee ip raw OUTPUT verdict continue
trace id f941b4ee ip raw OUTPUT policy accept
trace id f941b4ee ip mangle OUTPUT verdict continue
trace id f941b4ee ip mangle OUTPUT policy accept
trace id f941b4ee ip nat OUTPUT packet: oif "lo" ip saddr 127.0.0.1 ip daddr 127.0.0.1 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 53609 ip length 60 tcp sport 51048 tcp dport 2399 tcp flags == syn tcp window 43690
trace id f941b4ee ip nat OUTPUT rule fib daddr type local counter packets 476 bytes 28560 jump CNI-HOSTPORT-DNAT (verdict jump CNI-HOSTPORT-DNAT)
trace id f941b4ee ip nat CNI-HOSTPORT-DNAT verdict continue
trace id f941b4ee ip nat OUTPUT verdict continue
trace id f941b4ee ip nat OUTPUT policy accept
trace id f941b4ee ip filter OUTPUT packet: oif "lo" ip saddr 127.0.0.1 ip daddr 127.0.0.1 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 53609 ip length 60 tcp sport 51048 tcp dport 2399 tcp flags == syn tcp window 43690
trace id f941b4ee ip filter OUTPUT rule counter packets 20869 bytes 5222004 jump KUBE-ROUTER-OUTPUT (verdict jump KUBE-ROUTER-OUTPUT)
trace id f941b4ee ip filter KUBE-ROUTER-OUTPUT verdict continue
trace id f941b4ee ip filter OUTPUT rule ct state new counter packets 2622 bytes 162617 jump KUBE-PROXY-FIREWALL (verdict jump KUBE-PROXY-FIREWALL)
trace id f941b4ee ip filter KUBE-PROXY-FIREWALL verdict continue
trace id f941b4ee ip filter OUTPUT rule ct state new counter packets 2622 bytes 162617 jump KUBE-SERVICES (verdict jump KUBE-SERVICES)
trace id f941b4ee ip filter KUBE-SERVICES verdict continue
trace id f941b4ee ip filter OUTPUT rule counter packets 20781 bytes 5214054 jump KUBE-FIREWALL (verdict jump KUBE-FIREWALL)
trace id f941b4ee ip filter KUBE-FIREWALL verdict continue
trace id f941b4ee ip filter OUTPUT verdict continue
trace id f941b4ee ip filter OUTPUT policy accept
trace id f941b4ee ip mangle POSTROUTING verdict continue
trace id f941b4ee ip mangle POSTROUTING policy accept
trace id f941b4ee ip nat POSTROUTING packet: oif "lo" ip saddr 127.0.0.1 ip daddr 127.0.0.1 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 53609 ip length 60 tcp sport 51048 tcp dport 2399 tcp flags == syn tcp window 43690
trace id f941b4ee ip nat POSTROUTING rule counter packets 1293 bytes 80403 jump CNI-HOSTPORT-MASQ (verdict jump CNI-HOSTPORT-MASQ)
trace id f941b4ee ip nat CNI-HOSTPORT-MASQ rule counter packets 1293 bytes 80403 masquerade (verdict accept)
trace id 4eb0d1cf ip raw PREROUTING packet: iif "lo" @ll,0,112 2048 ip saddr 172.31.10.14 ip daddr 127.0.0.1 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 53609 ip length 60 tcp sport 51048 tcp dport 2399 tcp flags == syn tcp window 43690
trace id 4eb0d1cf ip raw PREROUTING rule meta l4proto tcp ip daddr 127.0.0.1 tcp dport { 2379,2399} counter packets 558 bytes 33480 meta nftrace set 1 (verdict continue)
trace id 4eb0d1cf ip raw PREROUTING verdict continue
trace id 4eb0d1cf ip raw PREROUTING policy accept
trace id 4eb0d1cf ip mangle PREROUTING verdict continue
trace id 4eb0d1cf ip mangle PREROUTING policy accept
trace id 4eb0d1cf ip mangle INPUT verdict continue
trace id 4eb0d1cf ip mangle INPUT policy accept
trace id 4eb0d1cf ip filter INPUT packet: iif "lo" @ll,0,112 2048 ip saddr 172.31.10.14 ip daddr 127.0.0.1 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 53609 ip length 60 tcp sport 51048 tcp dport 2399 tcp flags == syn tcp window 43690
trace id 4eb0d1cf ip filter INPUT rule counter packets 22321 bytes 63209641 jump KUBE-ROUTER-INPUT (verdict jump KUBE-ROUTER-INPUT)
trace id 4eb0d1cf ip filter KUBE-ROUTER-INPUT verdict continue
trace id 4eb0d1cf ip filter INPUT rule ct state new counter packets 3830 bytes 212938 jump KUBE-PROXY-FIREWALL (verdict jump KUBE-PROXY-FIREWALL)
trace id 4eb0d1cf ip filter KUBE-PROXY-FIREWALL verdict continue
trace id 4eb0d1cf ip filter INPUT rule counter packets 22231 bytes 63203120 jump KUBE-NODEPORTS (verdict jump KUBE-NODEPORTS)
trace id 4eb0d1cf ip filter KUBE-NODEPORTS verdict continue
trace id 4eb0d1cf ip filter INPUT rule ct state new counter packets 3830 bytes 212938 jump KUBE-EXTERNAL-SERVICES (verdict jump KUBE-EXTERNAL-SERVICES)
trace id 4eb0d1cf ip filter KUBE-EXTERNAL-SERVICES verdict continue
trace id 4eb0d1cf ip filter INPUT rule counter packets 22231 bytes 63203120 jump KUBE-FIREWALL (verdict jump KUBE-FIREWALL)
trace id 4eb0d1cf ip filter KUBE-FIREWALL rule ip saddr != 127.0.0.0/8 ip daddr 127.0.0.0/8 ct state != related,established counter packets 2323 bytes 139380 drop (verdict drop)
The key difference appears to be in the nat POSTROUTING
chain, which jumps into CNI-HOSTPORT-MASQ
. This is now matching the outbound packet and triggering the masquerade:
Before:
trace id bd0052fb ip nat POSTROUTING packet: oif "lo" ip saddr 127.0.0.1 ip daddr 127.0.0.1 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 46391 ip length 60 tcp sport 41168 tcp dport 2399 tcp flags == syn tcp window 43690
trace id bd0052fb ip nat POSTROUTING rule counter packets 787 bytes 47776 jump CNI-HOSTPORT-MASQ (verdict jump CNI-HOSTPORT-MASQ)
trace id bd0052fb ip nat CNI-HOSTPORT-MASQ verdict continue
trace id bd0052fb ip nat POSTROUTING rule counter packets 914 bytes 56102 jump KUBE-POSTROUTING (verdict jump KUBE-POSTROUTING)
trace id bd0052fb ip nat KUBE-POSTROUTING verdict return
trace id bd0052fb ip nat POSTROUTING rule counter packets 865 bytes 52899 jump FLANNEL-POSTRTG (verdict jump FLANNEL-POSTRTG)
trace id bd0052fb ip nat FLANNEL-POSTRTG verdict continue
trace id bd0052fb ip nat POSTROUTING verdict continue
trace id bd0052fb ip nat POSTROUTING policy accept
# packet still from 127.0.1.1
trace id 3b0c5f4b ip raw PREROUTING packet: iif "lo" @ll,0,112 2048 ip saddr 127.0.0.1 ip daddr 127.0.0.1 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 46391 ip length 60 tcp sport 41168 tcp dport 2399 tcp flags == syn tcp window 43690
After:
trace id f941b4ee ip nat POSTROUTING packet: oif "lo" ip saddr 127.0.0.1 ip daddr 127.0.0.1 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 53609 ip length 60 tcp sport 51048 tcp dport 2399 tcp flags == syn tcp window 43690
trace id f941b4ee ip nat POSTROUTING rule counter packets 1293 bytes 80403 jump CNI-HOSTPORT-MASQ (verdict jump CNI-HOSTPORT-MASQ)
trace id f941b4ee ip nat CNI-HOSTPORT-MASQ rule counter packets 1293 bytes 80403 masquerade (verdict accept)
# note that the packet has now been masqueraded and has a source address of 127.31.10.14 instead of 127.0.0.1
trace id 4eb0d1cf ip raw PREROUTING packet: iif "lo" @ll,0,112 2048 ip saddr 172.31.10.14 ip daddr 127.0.0.1 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 53609 ip length 60 tcp sport 51048 tcp dport 2399 tcp flags == syn tcp window 43690
Notably, the CNI-HOSTPORT-MASQ rule no longer matches the mark, but instead matches all packets:
[root@ip-172-31-4-16 ~]# /var/lib/rancher/k3s/data/current/bin/aux/xtables-nft-multi iptables-nft-save 2>/dev/null | grep CNI-HOSTPORT-MASQ
:CNI-HOSTPORT-MASQ - [0:0]
-A POSTROUTING -m comment --comment "CNI portfwd requiring masquerade" -j CNI-HOSTPORT-MASQ
-A CNI-HOSTPORT-MASQ -m mark --mark 0x2000/0x2000 -j MASQUERADE
vs
root@ip-172-31-10-14 ~]# /var/lib/rancher/k3s/data/current/bin/aux/xtables-nft-multi iptables-nft-save 2>/dev/null | grep CNI-HOSTPORT-MASQ
:CNI-HOSTPORT-MASQ - [0:0]
-A POSTROUTING -m comment --comment "CNI portfwd requiring masquerade" -j CNI-HOSTPORT-MASQ
-A CNI-HOSTPORT-MASQ -j MASQUERADE
the CNI-HOSTPORT-MASQ
rule comes from the portmap plugin, so I suspect this is related to
I can confirm that just clearing the rule allows K3s to start successfully. I don’t even have to do anything, just let it retry and it picks up after a minute.
iptables -t nat -F CNI-HOSTPORT-MASQ
The issue appears to be that the host's iptables-save is buggy and does not properly output the mark match:
[root@ip-172-31-10-14 ~]# /var/lib/rancher/k3s/data/current/bin/aux/iptables-save | grep CNI-HOSTPORT-MASQ
# Warning: iptables-legacy tables present, use iptables-legacy-save to see them
:CNI-HOSTPORT-MASQ - [0:0]
-A POSTROUTING -m comment --comment "CNI portfwd requiring masquerade" -j CNI-HOSTPORT-MASQ
-A CNI-HOSTPORT-MASQ -m mark --mark 0x2000/0x2000 -j MASQUERADE
[root@ip-172-31-10-14 ~]# iptables-save | grep CNI-HOSTPORT-MASQ
:CNI-HOSTPORT-MASQ - [0:0]
-A POSTROUTING -m comment --comment "CNI portfwd requiring masquerade" -j CNI-HOSTPORT-MASQ
-A CNI-HOSTPORT-MASQ -j MASQUERADE
# Warning: iptables-legacy tables present, use iptables-legacy-save to see them
Anything that calls iptables-save/iptables-restore after k3s has started will break connectivity to localhost. This does not appear to be new; if I go back to 1.27.4 prior to the update of CNI plugins, I see that using the host iptable-save will still break k3s if using the embedded etcd. All you have to do is:
systemctl stop k3s
iptables-save | iptables-restore
systemcl start k3s
Running k3s-killall.sh
will wipe the CNI rules, which will allow K3s to start up again successfully - at least until the next time the rules are corrupted by the broken host tools.
If run with prefer-bundled-bin
not enabled, then the mark rules are properly dumped by the host tools:
root@ip-172-31-10-14 ~]# iptables-save | grep CNI-HOSTPORT-MASQ
# Warning: iptables-legacy tables present, use iptables-legacy-save to see them
:CNI-HOSTPORT-MASQ - [0:0]
-A POSTROUTING -m comment --comment "CNI portfwd requiring masquerade" -j CNI-HOSTPORT-MASQ
-A CNI-HOSTPORT-MASQ -m mark --mark 0x2000/0x2000 -j MASQUERADE
This makes me suspect that this has been a problem with the bundled iptables binaries since we last bumped the buildroot version way back in https://github.com/k3s-io/k3s/pull/6400. That would mean that this issue has been present since 1.25.5+k3s1. Sure enough, if I install 1.25.4+k3s1 it works fine:
[root@ip-172-31-10-14 ~]# k3s --version
k3s version v1.25.4+k3s1 (0dc63334)
go version go1.19.3
[root@ip-172-31-10-14 ~]# grep prefer-bundled /etc/rancher/k3s/config.yaml
prefer-bundled-bin: true
[root@ip-172-31-10-14 ~]# iptables-save | grep CNI-HOSTPORT-MASQ
:CNI-HOSTPORT-MASQ - [0:0]
-A POSTROUTING -m comment --comment "CNI portfwd requiring masquerade" -j CNI-HOSTPORT-MASQ
-A CNI-HOSTPORT-MASQ -m mark --mark 0x2000/0x2000 -j MASQUERADE
# Warning: iptables-legacy tables present, use iptables-legacy-save to see them
I'm not sure what might have changed recently to make this more of an issue; perhaps something has been added to these distros that calls iptables-save/iptables-restore on restart? However, since this is apparently not due to any recent changes in K3s, I think we should best handle it via documentation noting that the host iptables save/restore tools MUST NOT be used alongside the bundled iptables bins on EL7 distros. Users are welcome to use the K3s bundled bins in preference to the host bins system-wide by running:
ln -sf /var/lib/rancher/k3s/data/current/bin/aux/xtables-nft-multi /sbin/xtables-nft-multi
However, since this is apparently not due to any recent changes in K3s, I think we should best handle it via documentation noting that the host iptables save/restore tools MUST NOT be used alongside the bundled iptables bins on EL7 distros.
So the official install script should not be used to upgrade k3s in these cases, as it is hardcoded to use the host's tools? Should we just switch out the binary and restart K3s? Both upgrade mechanisms are documented at k3s.io.
We are currently preferring to use the install script to automatically pick up changes to the systemd service file.
on EL7 distros.
You probably mean EL8?
So the official install script should not be used to upgrade k3s in these cases, as it is hardcoded to use the host's tools?
That is a good point; we should probably update the changes from https://github.com/k3s-io/k3s/pull/7274 to use the bundled versions, or perhaps filter out the CNI rules that it will break. I'll have to think on that for a moment. We can't prevent anyone from breaking it themselves by using the host save/restore tools, but we should at least not do it ourselves.
You probably mean EL8?
Yes.
As far as I can tell, using the host iptables-save/iptables-restore commands provided by EL8's iptables package to manage rules created by the k3s bundled iptables binaries has been broken since v1.25.5+k3s1. There were no changes between any of the versions you're using that would have changed the behavior, and indeed I can reproduce it at any time just by doing a save|restore and restarting k3s - no upgrade necessary.
and indeed I can reproduce it at any time just by doing a save|restore and restarting k3s - no upgrade necessary.
But still, using the provided vagrant reproducer I can only reproduce this issue by upgrading to Edit: Actually, no, this fails, too. I couldn't reproduce the issue reliably, because I didn't wait long enough between the initial install and the upgrade. I needed to increase the wait time to 60 seconds until I could reproduce the issue reliably.v1.27.7+k3s1
. Upgrades to earlier versions work fine.
I think there is very little to gain in understanding this little detail, so I am fine with the current state of the investigation :)
using the provided vagrant reproducer I can only reproduce this issue by upgrading to v1.27.7+k3s1.
Are you seeing anything different than I am with regards to the host iptables-save/iptables-restore commands dropping the --mark 0x2000/0x2000
option from the portmap cni rules? This appears to be the root cause of the issue, and is reproducible going all the way back to v1.25.5+k3s1.
The install script was changed to call iptables-save/iptables-restore in April, and I'm not seeing anything newer than that related to iptables.
Are you seeing anything different than I am with regards to the host iptables-save/iptables-restore commands dropping the --mark 0x2000/0x2000 option from the portmap cni rules?
No, I can confirm all of your findings (even though I don't really understand what these --mark
thingies are good for or even how you did the connection traces. I feel out of my league here).
I was also wrong by claiming that upgrades to versions before v1.27.6+k3s1
are working. Well, it does work - for us. But inside the vanilla AlmaLinux 8 VM using Vagrant, upgrades to previous versions of K3s fail in the same way.
So why did upgrades to previous versions of K3s work for us? I am guessing that this has something to do with the custom iptables rules we deploy to our machines, which somehow made upgrades work up to and including v1.27.6+k3s1
. We may never really know, but I am actually fine with that.
I see that you already prepared a PR to check for a faulty iptables-save
inside install.sh
. Seems a bit hacky, but it's still way better than any solutions I would've come up with. Thanks! I will probably migrate away from install.sh
for upgrade purposes anyway. Neither the system-upgrade-controller nor https://github.com/k3s-io/k3s-ansible do rely on install.sh
, so I see very little reason for me to keep relying on it.
Thank you for taking my very unstructured "discussion" seriously!
Config.yaml
write-kubeconfig-mode: 644
cluster-init: true
token: <TOKEN>
prefer-bundled-bin: true
Version installed
]$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-172-31-2-207.us-east-2.compute.internal Ready control-plane,etcd,master 88s v1.28.2+k3s1
[rocky@ip-172-31-2-207 ~]$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-6799fbcd5-55wmr 1/1 Running 0 77s
kube-system helm-install-traefik-crd-5fgmk 0/1 Completed 0 77s
kube-system helm-install-traefik-qgtpq 0/1 Completed 1 77s
kube-system local-path-provisioner-84db5d44d9-rjph5 1/1 Running 0 77s
kube-system metrics-server-67c658944b-xtf6z 1/1 Running 0 77s
kube-system svclb-traefik-bd0ba429-5jwt7 2/2 Running 0 50s
kube-system traefik-7bf7d7576d-7vltg 1/1 Running 0 50s
Successful upgrade
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-172-31-2-207.us-east-2.compute.internal Ready control-plane,etcd,master 5m v1.28.3-rc3+k3s2
[rocky@ip-172-31-2-207 ~]$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-6799fbcd5-55wmr 0/1 Running 1 (83s ago) 4m49s
kube-system helm-install-traefik-crd-5fgmk 0/1 Completed 0 4m49s
kube-system helm-install-traefik-qgtpq 0/1 Completed 1 4m49s
kube-system local-path-provisioner-84db5d44d9-rjph5 1/1 Running 1 (83s ago) 4m49s
kube-system metrics-server-67c658944b-xtf6z 0/1 Running 1 (83s ago) 4m49s
kube-system svclb-traefik-bd0ba429-5jwt7 2/2 Running 2 (83s ago) 4m22s
kube-system traefik-7bf7d7576d-7vltg 1/1 Running 1 (83s ago) 4m22s
I am creating this as a discussion because I've seen no one else having this issue.
When updating our clusters from
v1.27.6+k3s1
tov1.27.7+k3s1
, the embedded Etcd doesn't seem to come up correctly. Even when addingdebug: true
the logs don't show anything helpful, just that the etcd-client cannot connect.Please see the attached logs: k3s.1.27.7.debug.log
It doesn't matter which of the three server nodes I try to update first. The first node I try to update always refuses to come up.
Our
config.yaml
looks like this:I wish I could describe the issue better, but the logs don't give me much to work with.
Edit: The logs created by
v1.27.6+k3s1
look pretty much identical, except for the non-failing etcd-client.Edit 2:
v1.28.3+k3s1
fails for us, too.v1.26.10+k3s1
is working fine.Originally posted by @ChristianCiach in https://github.com/k3s-io/k3s/discussions/8780