Closed mgabeler-lee-6rs closed 1 year ago
E0403 15:02:19.425921 1727565 controller.go:156] Unable to perform initial Kubernetes service initialization: Service "kubernetes" is invalid: spec.clusterIPs: Invalid value: []string{"10.43.0.1"}: failed to allocate IP 10.43.0.1: cannot allocate resources of type serviceipallocations at this time
You will see this in absolutely every K3s server startup log ever. This always happens during initial cluster startup, and is resolved within miliseconds once the rest of the controllers are initialized.
--tls-san=0.0.0.0
This isn't a valid TLS SAN; you will never connect to a node using the IP 0.0.0.0
.
Please attach the complete K3s service log, as well as example commands and output showing whatever errors you're encountering. The information you've provided here doesn't provide enough detail to actual discern what's going on with your environment,
--tls-san=0.0.0.0
This isn't a valid TLS SAN; you will never connect to a node using the IP
0.0.0.0
.
This is something I got copied from somewhere to allow the fake cert to be accepted for different IP addresses ... I forget where I got it from, and Google is failing me right now. Removing it doesn't help (I wiped the k8s environment to start fresh just to make sure).
You will see this in absolutely every K3s server startup log ever. This always happens during initial cluster startup, and is resolved within miliseconds once the rest of the controllers are initialized.
OK, I thought it wasn't resolved, but I was looking for the service in the kube-system
namespace instead of the default
namespace.
Please attach the complete K3s service log, as well as example commands and output showing whatever errors you're encountering. The information you've provided here doesn't provide enough detail to actual discern what's going on with your environment,
:+1: k3s.log coredns.log local-path-provisioner.log
Everything else I see failing seems to boil down to coredns and/or local-path-provisioner failing to start, and those seem to be failing to start because they're timing out trying to contact the main kubernetes api endpoint. At least that is notably different in the logs for those services vs. the logs from a working machine running 1.25.x -- the working machine fails to contact the api endpoint at startup, but only temporarily before that stops and they work normally.
I can hit https://<my-local-ip>:6443
which seems to be where that service points at, so I'm not sure what's going on with the pods.
All my other pod failures at this point are DNS failures trying to talk to the coredns pod that isn't up
Are you running k3s as a user, instead of as a systemd service? There are some odd errors in the logs about cgroups and missing containers:
E0403 15:55:39.202799 1796046 remote_runtime.go:415] "ContainerStatus from runtime service failed" err="rpc error: code = Unknown desc = Error: No such container: 66c56e2d3eed4b340690cfd59d48c7a5a61401a554aa92f9c0d95af7dee48357" containerID="66c56e2d3eed4b340690cfd59d48c7a5a61401a554aa92f9c0d95af7dee48357"
I0403 15:55:39.202806 1796046 volume_manager.go:293] "Starting Kubelet Volume Manager"
I0403 15:55:39.202814 1796046 kuberuntime_gc.go:362] "Error getting ContainerStatus for containerID" containerID="66c56e2d3eed4b340690cfd59d48c7a5a61401a554aa92f9c0d95af7dee48357" err="rpc error: code = Unknown desc = Error: No such container: 66c56e2d3eed4b340690cfd59d48c7a5a61401a554aa92f9c0d95af7dee48357"
I0403 15:55:39.202844 1796046 desired_state_of_world_populator.go:151] "Desired state populator starts to run"
E0403 15:55:39.203485 1796046 remote_runtime.go:415] "ContainerStatus from runtime service failed" err="rpc error: code = Unknown desc = Error: No such container: b1ebd4dd0190bd5a5bc3bf9c2a6e42724e2b17d19f109d2c0f5183609b3c068e" containerID="b1ebd4dd0190bd5a5bc3bf9c2a6e42724e2b17d19f109d2c0f5183609b3c068e"
I0403 15:55:39.203504 1796046 kuberuntime_gc.go:362] "Error getting ContainerStatus for containerID" containerID="b1ebd4dd0190bd5a5bc3bf9c2a6e42724e2b17d19f109d2c0f5183609b3c068e" err="rpc error: code = Unknown desc = Error: No such container: b1ebd4dd0190bd5a5bc3bf9c2a6e42724e2b17d19f109d2c0f5183609b3c068e"
E0403 15:55:39.204177 1796046 remote_runtime.go:415] "ContainerStatus from runtime service failed" err="rpc error: code = Unknown desc = Error: No such container: 61acb17030e060ee20b9e794fa848dfa68ffe1e6f56eefce3b86cf796da4bbce" containerID="61acb17030e060ee20b9e794fa848dfa68ffe1e6f56eefce3b86cf796da4bbce"
I0403 15:55:39.204195 1796046 kuberuntime_gc.go:362] "Error getting ContainerStatus for containerID" containerID="61acb17030e060ee20b9e794fa848dfa68ffe1e6f56eefce3b86cf796da4bbce" err="rpc error: code = Unknown desc = Error: No such container: 61acb17030e060ee20b9e794fa848dfa68ffe1e6f56eefce3b86cf796da4bbce"
E0403 15:55:39.204395 1796046 remote_runtime.go:415] "ContainerStatus from runtime service failed" err="rpc error: code = Unknown desc = Error: No such container: b998fc83bb0c3d68ecf09aeead4adbacdf0af5bdebd8489fbab0ef2f30dd381f" containerID="b998fc83bb0c3d68ecf09aeead4adbacdf0af5bdebd8489fbab0ef2f30dd381f"
I0403 15:55:39.204406 1796046 kuberuntime_gc.go:362] "Error getting ContainerStatus for containerID" containerID="b998fc83bb0c3d68ecf09aeead4adbacdf0af5bdebd8489fbab0ef2f30dd381f" err="rpc error: code = Unknown desc = Error: No such container: b998fc83bb0c3d68ecf09aeead4adbacdf0af5bdebd8489fbab0ef2f30dd381f"
E0403 15:55:39.205004 1796046 remote_runtime.go:415] "ContainerStatus from runtime service failed" err="rpc error: code = Unknown desc = Error: No such container: 0a643f81c5a469086a00889c11d125529bb0dba91a6a6115d3c92e01b6c0eed7" containerID="0a643f81c5a469086a00889c11d125529bb0dba91a6a6115d3c92e01b6c0eed7"
I0403 15:55:39.205021 1796046 kuberuntime_gc.go:362] "Error getting ContainerStatus for containerID" containerID="0a643f81c5a469086a00889c11d125529bb0dba91a6a6115d3c92e01b6c0eed7" err="rpc error: code = Unknown desc = Error: No such container: 0a643f81c5a469086a00889c11d125529bb0dba91a6a6115d3c92e01b6c0eed7"
K3s as a whole seems to be running in your user slice instead of in a dedicated slice for a systemd service unit?
I0403 15:55:39.312502 1796046 container_manager_linux.go:626] "Failed to ensure state" containerName="/k3s" err="failed to find container of PID 1796046: cpu and memory cgroup hierarchy not unified. cpu: /user.slice, memory: /user.slice/user-1000.slice/user@1000.service"
Yes, running this "by hand", but still as root (sudo k3s ...
). This used to work OK, and fits a little better in our developer environments as we can more directly manage it as part of the rest of the dev stack. Is that no longer a viable option as of 1.26?
I'm not sure that its strictly related, but it probably isn't great either. If you've got docker using the systemd cgroup manager, the kubelet will want to do the same, but it can't because it's all running under your user slice instead of in a dedicated system slice. Kubernetes in general is moving heavily towards using the systemd cgroup manager and cgroupv2, and just running k3s via sudo
from a shell does not allow it to do that.
Have you confirmed that you don't have any iptables rules (managed via ufw/firewalld/etc) that might be interfering with things?
We can't run cgroups v2 because our workload requires the ability to run Ubuntu 18.04 os-containers, and that doesn't have a new enough version of systemd to work that way.
I checked iptables rules, yes, and the only ones that stuck out as different between the working and non-working systems were some REJECT
s in KUBE-SERVICES
... for the coredns and other services that weren't ready yet due to the kube api server issue, annotated as "has no endpoints", so that part made sense.
I'll try running k3s under a normal systemd unit and see if it makes things better.
That didn't help, here's a fresh log from that: k3s-systemd.log
I also tried switching away from --docker
just to try to isolate issues, it didn't help
the only ones that stuck out as different between the working and non-working systems were some REJECTs in KUBE-SERVICES ... for the coredns and other services that weren't ready yet due to the kube api server issue, annotated as "has no endpoints", so that part made sense.
Can you show the output of kubectl get service,endpoints,networkpolicy -A -o wide
?
If you're dealing with a single-node cluster, the only thing that might be interfering with access to the in-cluster kubernetes endpoint would be something else inserting conflicting iptables rules, or perhaps otherwise mucking with the container network interfaces.
If you're dealing with a single-node cluster, the only thing that might be interfering with access to the in-cluster kubernetes endpoint would be something else inserting conflicting iptables rules, or perhaps otherwise mucking with the container network interfaces.
This was my thought too, but I was struggling with how to see the iptables rules that applied within the network namespaces. Esp. since the coredns
image doesn't even have /bin/sh
. I guess I'd need to create a pod running in privileged mode / cap_net_admin enabled for the iptables binary to be able to see things? would that be helpful?
Can you show the output of kubectl get service,endpoints,networkpolicy -A -o wide ?
E0403 17:45:08.289309 1924749 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0403 17:45:08.294308 1924749 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0403 17:45:08.295147 1924749 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0403 17:45:08.298071 1924749 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0403 17:45:08.299441 1924749 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0403 17:45:08.300804 1924749 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0403 17:45:08.302003 1924749 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0403 17:45:08.302932 1924749 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default service/kubernetes ClusterIP 10.43.0.1 <none> 443/TCP 20s <none>
kube-system service/kube-dns ClusterIP 10.43.0.10 <none> 53/UDP,53/TCP,9153/TCP 17s k8s-app=kube-dns
kube-system service/metrics-server ClusterIP 10.43.91.45 <none> 443/TCP 16s k8s-app=metrics-server
NAMESPACE NAME ENDPOINTS AGE
default endpoints/kubernetes 10.0.0.174:6443 20s
kube-system endpoints/kube-dns <none> 5s
kube-system endpoints/metrics-server 5s
the CNI and kubelet managed iptables rules are all in the host namespace, I wouldn't expect you to see any in the pods.
What version of Debian and Docker are you running? Have you customized the system configuration in any particularly interesting ways? What do you see if you run ip addr
in a pod?
I regularly run k3s on Ubuntu 22.04 (its my primary development OS) both with containerd and with docker and have not seen any issues, nor have I seen anyone else report similar issues, so I suspect there's something else going on with your OS configuration.
The host system is Debian bookworm (almost ready new stable release, but technically still "testing"). I'll try on an Ubuntu 22.04 VM to see if that makes a difference. Docker is 20.10.23+dfsg1
(from the debian docker.io package). Containerd is containerd github.com/containerd/containerd 1.6.18~ds1 1.6.18~ds1-1+b2
in case that's relevant.
I haven't customized this system much, no, installing k3s is the most "interesting" thing I've done to it regarding networking setups. I have these saved iptables rules for a basic "anything outgoing, nothing incoming except LAN subnets" config, and the KUBE rules all get added ahead of this in iptables:
*filter
:INPUT DROP
:FORWARD DROP
-A INPUT -i lo -j ACCEPT
-A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A INPUT -i wg+ -j ACCEPT
-A INPUT -s 10.0.0.0/8 -i wl+ -j ACCEPT
-A INPUT -s 10.0.0.0/8 -i en+ -j ACCEPT
-A INPUT -i docker+ -j ACCEPT
-A INPUT -i br-+ -j ACCEPT
COMMIT
Indeed creating a privileged pod saw nothing for iptables. ip addr
info from that pod:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: eth0@if106: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
link/ether ae:c0:bd:8e:dd:6c brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.42.0.10/24 brd 10.42.0.255 scope global eth0
valid_lft forever preferred_lft forever
Tried a few more things to isolate the issue:
So, there's a much smaller range to bisect here of where things might have gone wrong
Edit: continuing the bisect, 1.25.8-rc1+k3s1 is also showing the problem, so I guess further bisecting will require building from source, which I think I can do
1.25.7+k3s1: !! works !! 1.25.7-rc1+k3s1 is also showing the problem
There was no second RC of 1.25.7+k3s1, so these tags point to the same commit...
commit f7c20e237d0ad0eae83c1ce60d490da70dbddc0e (tag: v1.25.7-rc1+k3s1, tag: v1.25.7+k3s1)
Author: Matt Trachier <matt.trachier@suse.com>
Date: Wed Mar 1 15:29:10 2023 -0600
Update to v1.25.7-k3s1 (#7010)
* Update to v1.25.7
* update gh workflows and docker files to proper go version
---------
Signed-off-by: matttrach <matttrach@gmail.com>
~building from source is erroring out with some checks in scripts/version.sh
~
Edit: figured out what was going wrong here and worked around it
There was no second RC of 1.25.7+k3s1, so these tags point to the same commit...
git log
shows me:
commit 6c5ac02248834a4d59501f7f31404d1287e358db (tag: v1.25.8-rc2+k3s1, tag: v1.25.8+k3s1)
Author: Roberto Bonafiglia <roberto.bonafiglia@suse.com>
Date: Wed Mar 22 15:50:03 2023 +0100
Update flannel to fix NAT issue with old iptables version
Signed-off-by: Roberto Bonafiglia <roberto.bonafiglia@suse.com>
Edit: sorry, had a typo, it was the 1.25.8-rc1 that failed for me, 1.25.7 is passing.
The primary differences likely to impact you are updates to flannel and kube-router, the rest of the stuff in there isn't going to make much difference.
Just out of curiosity, you might try starting k3s with --prefer-bundled-bin
, on the off chance there are some problems with the version of iptables your hosts have?
Just out of curiosity, you might try starting k3s with --prefer-bundled-bin, on the off chance there are some problems with the version of iptables your hosts have?
Tried with v1.25.8-rc1, didn't help.
The primary differences likely to impact you are updates to flannel and kube-router, the rest of the stuff in there isn't going to make much difference.
My thought too, so I started my bisect with that first PR after 1.25.7 (#7061, building from commit f5d1f976d3727f2a62ea536dca91e0acebf98bdf ... but that fails to bring up the node, I think something must be wrong with the build. Keeps logging this in the k3s output:
time="2023-04-03T18:55:03-04:00" level=info msg="Waiting to retrieve agent configuration; server is not ready: failed to find host-local: exec: \"host-local\": executable file not found in $PATH"
Somehow it never setup /var/lib/rancher/k3s/data
when running the source build ... I think I need to run make package-cli
instead of make
in order to generate the fully bundled binary?
If you're going to try to build from source, I would recommend doing so on a host with Docker, and just do git clean -xffd && SKIP_VALIDATE=true make ci
Try the prefer-bundled-bin flag with a recent release before you go building stuff. Very little of the stuff you're poking at is in k3s itself, it's likely somewhere in one of the flannel module updates.
v1.25.8-rc1 and f5d1f976d3727f2a62ea536dca91e0acebf98bdf (first commit after 1.25.7) both fail with this error, including with --prefer-bundled-bin
Just to validate my source builds, v1.25.7 built from source, like the copy downloaded from github, works
trying downgrading flannel and/or kube-router to see if I can isolate it to one of those dependencies
v1.26.3+k3s1 on a clean ubuntu 22.04 vm works same on a clean debian bookworm vm: works ... so it's something with my local system
Starting from v1.25.8+k3s1:
v1.5.2-0.20221026101626-e01045262706
from before #7061) does make things workCheck your general system logs, do you have anything that's mucking about with the docker container interfaces when they are added? I have seen odd behavior from avahi adding multicast listeners to container interfaces, for example.
downgrading kube-router (to v1.5.2-0.20221026101626-e01045262706 from before https://github.com/k3s-io/k3s/pull/7061) does make things work
Can you try running k3s with --disable-network-policy
?
cc @rbrtbnfgl @thomasferrandiz - this may be more weirdness with the new v2.0.0 release of kube-router. As per the above output there are no network policies in place, but for some reason pods can't reach the in-cluster Kubernetes service endpoint.
I have seen odd behavior from avahi adding multicast listeners to container interfaces, for example.
I do have avahi running, but it's also on the "clean vm". The problem workstation has it configured to not listen on the docker interfaces, whereas the clean vm has the default config where it does listen there.
Can you try running k3s with
--disable-network-policy
?
Tried this back on v1.26.3, no luck.
Check your general system logs, do you have anything that's mucking about with the docker container interfaces when they are added?
Rummaging ...
Can you try running k3s with --disable-network-policy?
Tried this back on v1.26.3, no luck.
Hmm. All you did was revert the kube-router version and it works, but you can't use k3s with the updated kube-router even if it's disabled? That doesn't make any sense to me, if you disable it we don't run any of the affected code. Period. Maybe try disabling it on a fresh install/reboot, on the off chance there are some rules being left behind?
Perhaps spend a bit more time trying to figure out what about your machine makes it unique from the other nodes you were unable to reproduce on?
Can you try running k3s with --disable-network-policy?
Tried this back on v1.26.3, no luck.
Hmm. All you did was revert the kube-router version and it works, but you can't use k3s with the updated kube-router even if it's disabled?
On v1.25.8, yes. It also required reverting the associated changes to pkg/agent/netpol/netpol.go
, but that seems like it should be uninteresting in this context.
That doesn't make any sense to me, if you disable it we don't run any of the affected code. Period. Maybe try disabling it on a fresh install/reboot, on the off chance there are some rules being left behind?
Will check that, yeah. I rebooted once in this process to check that, but haven't since. Each time I switch versions I am fully stopping k3s, and stopping/deleting all the pods & associated containers, and doing all the "rm -rf" type stuff the uninstall script does, so it's a pretty fresh start each time.
Perhaps spend a bit more time trying to figure out what about your machine makes it unique from the other nodes you were unable to reproduce on?
Yep, I'm working on this in the background here, trying to disable various services & configurations & such to try and get to a working state and then turn things back on one by one.
I would also love to find out where the packets destined for the api server are going / getting dropped. Starting on that road, wireshark on the host listening to cni0 doesn't see any SYN packets coming in to 10.42.0.1 when just the coredns pod is running.
But, if I start a random other pod with a usable shell (I started a jobs.batch just running debian:stable
with a sleep so I can exec in), work out which veth...
interface corresponds to it, and have it try to connect to that same ip/port, I can see the packets in wireshark on both the veth...
and cni0
interfaces. Still no responses.
So I added a final rule to my iptables INPUT chain to log everything, and I see these coming through:
IN=cni0 OUT= MAC=4a:5e:56:e2:d6:a4:1e:e2:6d:be:07:dd:08:00 SRC=10.42.0.2 DST=10.0.0.174 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=24507 DF PROTO=TCP SPT=60064 DPT=6443 WINDOW=64860 RES=0x00 SYN URGP=0
And part of this finally clicks. This workstation, because it does have some firewalling enabled (corporate policies) has the INPUT chain configured with a default policy of DROP. It has rules to accept local traffic from the "normal" interfaces, but cni0
is not part of those rules. The "clean VM" isn't corporate-ized and so the INPUT chain's default policy is ACCEPT
So now the question is, what's different about the iptables rules on the different versions that causes things to be ACCEPTed somewhere before this on the old version but fall through to the DROP rule here. I'm going to wager that it somehow is the interface name, and that on the old version the packets "appear" on eno2
(my default network interface, which holds the 10.0.0.174 address above), instead of cni0
. I expect I can work around this by adding an accept rule for cni0
, but I'd like to understand more of the why since it may need a more robust workaround/fix for my coworkers.
I'm at the end of my day today, I will follow up with findings/results tomorrow.
And part of this finally clicks. This workstation, because it does have some firewalling enabled (corporate policies) has the INPUT chain configured with a default policy of DROP.
See, that's what I was fishing for when I asked:
Have you confirmed that you don't have any iptables rules (managed via ufw/firewalld/etc) that might be interfering with things?
See, that's what I was fishing for when I asked:
Yeah, just took me a moment to see it, been a little while since I poked at this stuff and forgot about the chain policy vs. having a default-drop rule at the end of the chain.
Digging further into the networking, it's sort of cni0
stuff. Something in the upgrade to kube-router
caused the iptables rules created to change in a way that make the new setup dependent on the host having a default-accept rule here.
Comparing iptables-save from 1.25.7 vs 1.25.8, the only thing that immediately jumps out at me is a change in placement of the -j FLANNEL-FWD
in the FORWARD
table, and a similar change to the -j FLANNEL-POSTRTG
in nat/POSTROUTING
. The former move from being early, between KUBE-ROUTER-FORWARD
and KUBE-PROXY-FIREWALL
rules, to being the very last rule in the FORWARD
chain. Similarly the latter moved from being between CNI-HOSTPORT-MASQ
and KUBE-POSTROUTING
to being the last rule in the chain.
Since that didn't immediately make things obvious, I used the nft trace system to gather more detailed data.
Key part of the trace for a packet on 1.25.7:
trace id 85f6def5 ip filter INPUT packet: iif "cni0" ether saddr 8a:e5:38:16:ba:8d ether daddr 6e:bc:51:89:9d:05 ip saddr 10.42.0.6 ip daddr 10.0.0.174 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 64261 ip length 52 tcp sport 39524 tcp dport 6443 tcp flags == ack tcp window 710
trace id 85f6def5 ip filter INPUT rule counter packets 9387 bytes 7334806 jump KUBE-ROUTER-INPUT (verdict jump KUBE-ROUTER-INPUT)
trace id 85f6def5 ip filter KUBE-ROUTER-INPUT rule ip saddr 10.42.0.6 counter packets 935 bytes 305298 jump KUBE-POD-FW-RBX4FCSO3CUKYMCM (verdict jump KUBE-POD-FW-RBX4FCSO3CUKYMCM)
trace id 85f6def5 ip filter KUBE-POD-FW-RBX4FCSO3CUKYMCM rule ct state related,established counter packets 2094 bytes 444756 accept (verdict accept)
Same portion of the trace for a packet on 1.25.8:
trace id 4498137c ip filter INPUT packet: iif "cni0" ether saddr d6:92:90:b0:15:1a ether daddr 76:c1:27:59:c9:eb ip saddr 10.42.0.6 ip daddr 10.0.0.174 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 4315 ip length 60 tcp sport 43500 tcp dport 6443 tcp flags == syn tcp window 64860
trace id 4498137c ip filter INPUT rule counter packets 3632 bytes 2970461 jump KUBE-ROUTER-INPUT (verdict jump KUBE-ROUTER-INPUT)
trace id 4498137c ip filter KUBE-ROUTER-INPUT rule ip saddr 10.42.0.6 counter packets 4 bytes 240 jump KUBE-POD-FW-5LYEVMXKAM6TCRT3 (verdict jump KUBE-POD-FW-5LYEVMXKAM6TCRT3)
trace id 4498137c ip filter KUBE-POD-FW-5LYEVMXKAM6TCRT3 rule ip saddr 10.42.0.6 counter packets 4 bytes 240 jump KUBE-NWPLCY-DEFAULT (verdict jump KUBE-NWPLCY-DEFAULT)
trace id 4498137c ip filter KUBE-NWPLCY-DEFAULT rule counter packets 21 bytes 1296 meta mark set mark or 0x10000 (verdict continue)
trace id 4498137c ip filter KUBE-NWPLCY-DEFAULT verdict continue meta mark 0x00010000
... runs through inapplicable rules for other pods/etc
trace id 4498137c ip filter KUBE-ROUTER-INPUT verdict return meta mark 0x00020000
trace id 4498137c ip filter INPUT rule ct state new counter packets 94 bytes 16173 jump KUBE-PROXY-FIREWALL (verdict jump KUBE-PROXY-FIREWALL)
trace id 4498137c ip filter KUBE-PROXY-FIREWALL verdict continue meta mark 0x00020000
trace id 4498137c ip filter INPUT rule counter packets 3498 bytes 2960971 jump KUBE-NODEPORTS (verdict jump KUBE-NODEPORTS)
trace id 4498137c ip filter KUBE-NODEPORTS verdict continue meta mark 0x00020000
trace id 4498137c ip filter INPUT rule ct state new counter packets 94 bytes 16173 jump KUBE-EXTERNAL-SERVICES (verdict jump KUBE-EXTERNAL-SERVICES)
trace id 4498137c ip filter KUBE-EXTERNAL-SERVICES verdict continue meta mark 0x00020000
trace id 4498137c ip filter INPUT rule counter packets 3498 bytes 2960971 jump KUBE-FIREWALL (verdict jump KUBE-FIREWALL)
trace id 4498137c ip filter KUBE-FIREWALL verdict continue meta mark 0x00020000
Since this mentions the network policy, I ran the same thing again with 1.25.8, but with --disable-network-policy
:
trace id d036fa9d ip filter INPUT packet: iif "cni0" ether saddr 6e:db:71:27:fd:05 ether daddr 46:67:7e:96:d8:9d ip saddr 10.42.0.2 ip daddr 10.0.0.174 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 3938 ip length 60 tcp sport 52258 tcp dport 6443 tcp flags == syn tcp window 64860
trace id d036fa9d ip filter INPUT rule ct state new counter packets 101 bytes 7292 jump KUBE-PROXY-FIREWALL (verdict jump KUBE-PROXY-FIREWALL)
trace id d036fa9d ip filter KUBE-PROXY-FIREWALL verdict continue
trace id d036fa9d ip filter INPUT rule counter packets 8921 bytes 6841389 jump KUBE-NODEPORTS (verdict jump KUBE-NODEPORTS)
trace id d036fa9d ip filter KUBE-NODEPORTS verdict continue
trace id d036fa9d ip filter INPUT rule ct state new counter packets 101 bytes 7292 jump KUBE-EXTERNAL-SERVICES (verdict jump KUBE-EXTERNAL-SERVICES)
trace id d036fa9d ip filter KUBE-EXTERNAL-SERVICES verdict continue
trace id d036fa9d ip filter INPUT rule counter packets 11552 bytes 7956318 jump KUBE-FIREWALL (verdict jump KUBE-FIREWALL)
trace id d036fa9d ip filter KUBE-FIREWALL verdict continue
trace id d036fa9d ip filter INPUT rule counter packets 25 bytes 1500 log prefix "input drop " (verdict continue)
trace id d036fa9d ip filter INPUT verdict continue
trace id d036fa9d ip filter INPUT policy drop
So, disabling the policy does remove a bunch of the stuff from the iptables rules, but it doesn't revert things fully to the old state. I think the with-network-policy ruleset has nearly what it should, but it seems the accept rule in some chains isn't right?
From the end of the KUBE-ROUTER-INPUT
chain in 1.25.7:
-A KUBE-ROUTER-INPUT -m comment --comment "rule to explicitly ACCEPT traffic that comply to network policies" -m mark --mark 0x20000/0x20000 -j ACCEPT
The same rule from 1.25.8:
-A KUBE-ROUTER-INPUT -m comment --comment "rule to explicitly ACCEPT traffic that comply to network policies" -m mark --mark 0x20000/0x20000 -j RETURN
The latter says it's to ACCEPT
, but it actually does RETURN
, and I think this is, at the final end of the day, the crux of the issue?
At least this now easily matches some code, and I can git blame
this down to a specific commit/PR: #50, which was listed to fix #6691. Walking through linked items I came to this comment which seems relevant: https://github.com/cloudnativelabs/kube-router/issues/1453#issuecomment-1493595260
We changed kube-router behaviour to not ACCEPT by default the packets. @brandond is the user that has to configure the chain properly disabling any firewall on the node?
@rbrtbnfgl a couple questions:
KUBE-ROUTER-INPUT
chain is cleaned up properly when the network policy controller is disabled? It feels like we should ensure its absence when the NPC is disabled, as opposed it leaving it around with rules that might interfere with normal operation of the node.Edit: To answer my first question, I see the the commit at https://github.com/k3s-io/kube-router/commit/df90811446a19e1922a4d7faa226d926b476b0ae changes this. We need to more explicitly call these things out if we're going to change them, we can't hide this in a "version bump". Also, it feels like the comment on that rule needs to be updated; it still claims to be accepting.
I think this is probably a fine change for us to keep, we just need to be better about exposing this sort of stuff in our release notes.
Yes maybe my commit message could have been much more explanatory.
Infrastructure Cloud EC2 instance
Node(s) CPU architecture, OS, and Version: Ubuntu 20.04
Cluster Configuration: Single node
Config.yaml:
cat /etc/rancher/k3s/config,yaml
curl -fL https://get.k3s.io| INSTALL_K3S_VERSION=v1.25.8+k3s1 sh -s - server
Results from reproducing on v1.25.8+k3s1:
$ k3s -v
k3s version v1.25.8+k3s1 (6c5ac022)
$ sudo iptables-save |grep network |grep ROUTER
-A KUBE-ROUTER-FORWARD -m comment --comment "rule to explicitly ACCEPT traffic that comply to network policies" -m mark --mark 0x20000/0x20000 -j RETURN
-A KUBE-ROUTER-INPUT -m comment --comment "rule to explicitly ACCEPT traffic that comply to network policies" -m mark --mark 0x20000/0x20000 -j RETURN
-A KUBE-ROUTER-OUTPUT -m comment --comment "rule to explicitly ACCEPT traffic that comply to network policies" -m mark --mark 0x20000/0x20000 -j RETURN
Results from commit on master branch:
$ sudo iptables-save |grep network |grep ROUTER
-A INPUT -m comment --comment "KUBE-ROUTER rule to explicitly ACCEPT traffic that comply to network policies" -m mark --mark 0x20000/0x20000 -j ACCEPT
-A FORWARD -m comment --comment "KUBE-ROUTER rule to explicitly ACCEPT traffic that comply to network policies" -m mark --mark 0x20000/0x20000 -j ACCEPT
-A OUTPUT -m comment --comment "KUBE-ROUTER rule to explicitly ACCEPT traffic that comply to network policies" -m mark --mark 0x20000/0x20000 -j ACCEPT
Results from upgrade from v1.25.8+k3s1 to commit on master branch:
$ sudo iptables-save |grep network |grep ROUTER
-A INPUT -m comment --comment "KUBE-ROUTER rule to explicitly ACCEPT traffic that comply to network policies" -m mark --mark 0x20000/0x20000 -j ACCEPT
-A FORWARD -m comment --comment "KUBE-ROUTER rule to explicitly ACCEPT traffic that comply to network policies" -m mark --mark 0x20000/0x20000 -j ACCEPT
-A OUTPUT -m comment --comment "KUBE-ROUTER rule to explicitly ACCEPT traffic that comply to network policies" -m mark --mark 0x20000/0x20000 -j ACCEPT
-A KUBE-ROUTER-FORWARD -m comment --comment "rule to explicitly ACCEPT traffic that comply to network policies" -m mark --mark 0x20000/0x20000 -j RETURN
-A KUBE-ROUTER-INPUT -m comment --comment "rule to explicitly ACCEPT traffic that comply to network policies" -m mark --mark 0x20000/0x20000 -j RETURN
-A KUBE-ROUTER-OUTPUT -m comment --comment "rule to explicitly ACCEPT traffic that comply to network policies" -m mark --mark 0x20000/0x20000 -j RETURN
@mgabeler-lee-6rs are you able to test K3s as installed from:
curl -sL get.k3s.io | INSTALL_K3S_COMMIT=027cc187ce9f21157b8d37d62e67ee1c42968b4b sh -s -
I'd like to confirm that the fix from https://github.com/k3s-io/kube-router/pull/56 fixes your use case.
I should be able to test that tomorrow, yes :+1:
Unfortunately it doesn't seem to work here. I'll attach full iptables output (Going to 1.25.7 and it works immediately)
The issue is your LOGDROPIN
chain that it drops everything. The rule to ACCEPT is in place but is executed after that rule.
ok but how to fix? Before it worked.
I have no control over most of the rules since they are generated automatically, however I can add manual rules with a pre and post script...
Could you try to use -vnL
to check the counter of the matched packet?
it's how I suspected the LOGDROPIN
chain is dropping everything. Why using a chain to drop the packets when the INPUT
default policy is DROP
?
You were lucky that it was working before on your setup if you use kubeadm or RKE2 to setup kubernetes it wouldn't have worked too.
I have no idea, it's a commonly used firewall software script that generates this.
I can make pre/post rules: https://tecadmin.net/add-custom-iptables-rules-with-csf/
But I'd prefer a well supported and simple manner to open the ports.
BTW I created this as a post script, but it seems a lot and I'm afraid something might still be broken:
iptables -I INPUT -s 10.43.0.0/16 -j ACCEPT iptables -I INPUT -s 10.42.0.0/16 -j ACCEPT iptables -I INPUT -d 10.43.0.0/16 -j ACCEPT iptables -I INPUT -d 10.42.0.0/16 -j ACCEPT iptables -I OUTPUT -d 10.43.0.0/16 -j ACCEPT iptables -I OUTPUT -d 10.42.0.0/16 -j ACCEPT iptables -I OUTPUT -s 10.43.0.0/16 -j ACCEPT iptables -I OUTPUT -s 10.42.0.0/16 -j ACCEPT
K3s can't manage any possible configuration that a user could do on the node. I think adding those rules at the begin of the chain could interfere with the kube-proxy work. The firewall script has to be changed properly to accept the traffic on the needed ports.
It's been working for a long long time, I suffered with docker from time to time and I was delighted the k3s worked so well.
Can't you add the old behavior with some flag? It was perfect for us :)
You could try to configure that the script of the firewall is started after K3s. The iptables rules of K3s should be added before the rules created by the firewall.
It wouldn't work since the configserver rules are (auto) reloaded from time to time. If k3s provided a script, that could be called BEFORE the other rules are loaded.
That's not an issue because I think that those rules are added in append so if K3s is already started with all the rules in place they are always loaded after the K3s rules.
It resets them, pretty sure about this since I had some issues with docker rules. I will really miss the old behavior :(
I'll need to open all the ranges manually then I suppose.
I think that LOGDROPIN
chain is redundant considering that you have DROP
policy as default.
I don't think it's redundant, it's a chain used for logging. If you follow the input chain, you end up there. In any case I have no control over this. I also found a past issue with configserver not blocking traffic in all cases.
it's not only logging. It's dropping all the packets that arrive there.
Environmental Info: K3s Version:
Node(s) CPU architecture, OS, and Version:
Linux CENSORED 6.1.0-7-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.20-1 (2023-03-19) x86_64 GNU/Linux
Cluster Configuration:
Describe the bug:
After upgrading to k3s 1.26.x (version above) from 1.25.x, nothing would come up, even after wiping all k3s config data and starting a fresh cluster. After digging into logs, the issue traced backwards from "coredns not responding" to "coredns stuck waiting on
kubernetes
service" to thekubernetes
service failing to be initialized on the first cluster start attempt, and never attempting repair thereafter:Steps To Reproduce:
/usr/local/bin
sudo k3s server --write-kubeconfig-mode 644 --docker --kube-apiserver-arg=service-node-port-range=1024-32767 --tls-san=0.0.0.0
Expected behavior: It should be able to start a cluster
Actual behavior: It fails to start the cluster
Additional context / logs:
--docker
instead ofcontainerd
. Unless--docker
is no longer supported, please don't just say "you should use containerd / nerdctl". We've evaluated that, and it is not an easy replacement for our workflows & machine setups right now.