Closed kpango closed 3 years ago
Same error on kernel 5.12.4
Quick workaround : https://github.com/kubernetes-sigs/kind/issues/2240#issuecomment-838510890
Hi @kpango , thanks for opening this issue and especially for doing some investigation already :+1: Similar issue was already mentioned before in #604 (closed in favor of this issue). Let's see what we can do here :+1:
I'm currently setting up a test system, but I guess that given the workaround mentioned in the kind
issue, this could work here: k3d cluster create --k3s-server-arg "--kube-proxy-arg=conntrack-max-per-core=0" --k3s-agent-arg "--kube-proxy-arg=conntrack-max-per-core=0"
(https://rancher.com/docs/k3s/latest/en/installation/install-options/agent-config/ & https://kubernetes.io/docs/reference/command-line-tools-reference/kube-proxy/ & https://github.com/kubernetes-sigs/kind/pull/2241)
Just finished my test setup:
k3d cluster create --k3s-server-arg "--kube-proxy-arg=conntrack-max-per-core=0" --k3s-agent-arg "--kube-proxy-arg=conntrack-max-per-core=0" --image rancher/k3s:v1.20.6-k3s
(before: export K3D_FIX_CGROUPV2=true
as the system is on cgroupv2)5.12.3-1-default
This works just fine :+1:
@nemonik can you try this please?
Hi, I just faced this issue and the solution proposed by @iwilltry42 is working perfectly!
@iwilltry42 thanks for the workaround, it works well on Arch.
@iwilltry42 it works, command:
k3d cluster create --servers 1 --k3s-server-arg "--kube-proxy-arg=conntrack-max-per-core=0" --k3s-agent-arg "--kube-proxy-arg=conntrack-max-per-core=0"
but if we use agents server components still restarting frequently.
k3d cluster create --servers 1 --agents 5 --k3s-server-arg "--kube-proxy-arg=conntrack-max-per-core=0" --k3s-agent-arg "--kube-proxy-arg=conntrack-max-per-core=0"
k3d output
$ k3d cluster create --servers 1 --agents 5 --k3s-server-arg "--kube-proxy-arg=conntrack-max-per-core=0" --k3s-agent-arg "--kube-proxy-arg=conntrack-max-per-core=0"
INFO[0000] Prep: Network
INFO[0000] Re-using existing network 'k3d-k3s-default' (b8b69b90908fd823a253793aba7d4b707f0d6c8de1ead16733f332822e03f697)
INFO[0000] Created volume 'k3d-k3s-default-images'
INFO[0001] Creating node 'k3d-k3s-default-server-0'
INFO[0001] Creating node 'k3d-k3s-default-agent-0'
INFO[0001] Creating node 'k3d-k3s-default-agent-1'
INFO[0001] Creating node 'k3d-k3s-default-agent-2'
INFO[0001] Creating node 'k3d-k3s-default-agent-3'
INFO[0001] Creating node 'k3d-k3s-default-agent-4'
INFO[0001] Creating LoadBalancer 'k3d-k3s-default-serverlb'
INFO[0001] Starting cluster 'k3s-default'
INFO[0001] Starting servers...
INFO[0001] Starting Node 'k3d-k3s-default-server-0'
INFO[0006] Starting agents...
INFO[0006] Starting Node 'k3d-k3s-default-agent-0'
WARN[0389] Node 'k3d-k3s-default-agent-0' is restarting for more than a minute now. Possibly it will recover soon (e.g. when it's waiting to join). Consider using a creation timeout to avoid waiting forever in a Restart Loop.
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
8f913ad43059 rancher/k3s:v1.20.6-k3s1 "/bin/k3s agent --ku…" 10 minutes ago Up 2 minutes k3d-k3s-default-agent-0
b9af47763c3a rancher/k3s:v1.20.6-k3s1 "/bin/k3s server --k…" 10 minutes ago Restarting (255) 20 seconds ago k3d-k3s-default-server-0
docker's error log
docker logs --detail -t b9af47763c3a &> error.log
OS: Arch Linux Kernel: 5.12.4 Docker: 20.10.6 k3d: v4.4.3 k3s: v1.20.6-k3s1 (default)
Closing this, as it will be fixed upstream (in k3s), thanks to @brandond :pray:
I've added a new FAQ entry on this issue: https://k3d.io/faq/faq/#nodes-fail-to-start-or-get-stuck-in-notready-state-with-log-nf_conntrack_max-permission-denied
Also, thanks to #612 we quickly checked, that (obviously) also other kernel lines are affected, like 5.11 as of 5.11.19.
Just finished my test setup:
- Command:
k3d cluster create --k3s-server-arg "--kube-proxy-arg=conntrack-max-per-core=0" --k3s-agent-arg "--kube-proxy-arg=conntrack-max-per-core=0" --image rancher/k3s:v1.20.6-k3s
(before:export K3D_FIX_CGROUPV2=true
as the system is on cgroupv2)- OS: openSUSE Tumbleweed
- Kernel:
5.12.3-1-default
- Docker: 20.10.6-ce
- k3d: v4.4.3
- k3s: v1.20.6-k3s
This works just fine
@nemonik can you try this please?
So, I finally got to the point where I could try to press forward with this... Creating a cluster straight away with k3d on Arch running 5.12.10-arch1-1 still doesn't work using k3d installed out of AUR.
⋊> ⨯ k3d --version
k3d version v4.4.4
k3s version v1.20.6-k3s1 (default)
So, I did as you asked modifying for the latest k3s docker image or by just stripping out the image param like so
k3d cluster create ${k3d_cluster_name} --k3s-server-arg "--kube-proxy-arg=conntrack-max-per-core=0" \
--k3s-agent-arg "--kube-proxy-arg=conntrack-max-per-core=0" --image rancher/k3s:v1.21.1-k3s1 \
--api-port 127.0.0.1:6443 -p 80:80@loadbalancer -p 443:443@loadbalancer \
-p 2022:2022@loadbalancer --k3s-server-arg "--no-deploy=traefik" --registry-use ${registry_name}:${registry_port} \
--servers ${k3d_server_count} --agents ${k3d_agent_count}
or
k3d cluster create ${k3d_cluster_name} --k3s-server-arg "--kube-proxy-arg=conntrack-max-per-core=0" \
--k3s-agent-arg "--kube-proxy-arg=conntrack-max-per-core=0" \
--api-port 127.0.0.1:6443 -p 80:80@loadbalancer -p 443:443@loadbalancer \
-p 2022:2022@loadbalancer --k3s-server-arg "--no-deploy=traefik" --registry-use ${registry_name}:${registry_port} \
--servers ${k3d_server_count} --agents ${k3d_agent_count}
And I got it working as you advised...
@iwilltry42 Thank you for the fix. Sorry it took so long to try it.
But is this the advised way now (as per the faq) or has this been replaced over the past 20 some days?
Thanks for the reply @nemonik :+1:
Creating a cluster straight away with k3d on Arch running 5.12.10-arch1-1 still doesn't work using k3d installed out of AUR.
k3d v4.4.4 uses k3s v1.20.6 by default (that's hardcoded at build time), so that won't work.
So, I did as you asked modifying for the latest k3s docker image or by just stripping out the image param like so ...
When setting the --image
flag with one of the newer k3s versions (that include the fix referenced from k3s earlier in this thread), you don't need the kube-proxy-arg flags anymore :+1:
Also, v4.4.5 is just being released (there were issues with the release system delaying it), which will be using one of the newer versions by default (so no image or kube-proxy flags required).
When setting the
--image
flag with one of the newer k3s versions (that include the fix referenced from k3s earlier in this thread), you don't need the kube-proxy-arg
Do you still need the conntrack-max-per-core=0
args to be passed to the server and agent or do they go too?
I will watch for k3d v4.4.5 to drop in AUR
Thanks for the reply.
Do you still need the conntrack-max-per-core=0 args to be passed to the server and agent or do they go too?
Not with the new versions of k3s (v1.21.1-k3s1 is the new default in k3d v4.4.5 and includes the fix). I'll update the FAQ accordingly.
Danke schön @iwilltry42
k3d v4.4.5 hit AUR last night...
Note that I experienced the same issue with lower versions of the Linux Kernel, :
$ ./versions.sh
Client: Docker Engine - Community
Version: 20.10.8
API version: 1.41
Go version: go1.16.6
Git commit: 3967b7d
Built: Fri Jul 30 19:54:22 2021
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.8
API version: 1.41 (minimum version 1.12)
Go version: go1.16.6
Git commit: 75249d8
Built: Fri Jul 30 19:52:31 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.9
GitCommit: e25210fe30a0a703442421b0f60afac609f950a3
runc:
Version: 1.0.1
GitCommit: v1.0.1-0-g4144b63
docker-init:
Version: 0.19.0
GitCommit: de40ad0
k3d version v4.4.1
k3s version v1.20.5-k3s1 (default)
go version go1.17 linux/amd64
hugo v0.83.1+extended linux/amd64 BuildDate=unknown
flux version 0.17.0
k3s server open /proc/sys/net/netfilter/nf_conntrack_max: permission denied
an entry in the issues : https://github.com/rancher/k3d/issues/607
and the FAQ referencing my exact issue, caused by a GNU/Linux Kernel change (the kernel version used by my debian 11 bullseye is 5.10.0-8-amd64
), see https://k3d.io/faq/faq/#solved-nodes-fail-to-start-or-get-stuck-in-notready-state-with-log-nf_conntrack_max-permission-denied
what is funny though, is that the k3d documentation states that this issue raises for a kernel version >= 5.12.2 , but my version is 5.10.0-8-amd64
and I still have the exact same issue
Check kernel version :
jbl@fluxcd:~$ uname -r
5.10.0-8-amd64
jbl@fluxcd:~$ cat /proc/version
Linux version 5.10.0-8-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.46-4 (2021-08-03)
Yes, some Linux distributions backported the fix into earlier kernel versions. It was only present in mainline linux 5.12.2 and 5.11.19. You'd have to check with the Debian folks to see which releases they backported it to. Either way, there's a fix available. Just use newer K3s.
@brandond Oh, thanks very much for the information :)
I confirm that upgrading to k3d
4.4.5
solved the issue for me, thanks all :)
What did you do
How was the cluster created?
k3d cluster create "vald-cluster" -p "8081:80@loadbalancer" --agents 5
What did you do afterwards?
What did you expect to happen
I expect that k3d works with the latest kernels, just as it worked with Linux Kernel 5.11.
When multiple Agents were specified, k3d did not proceed to start the Agents when the cluster was created, and the Docker logs showed that the K3D Server was restarting at a high frequency.
In the container log, it looks like kube-proxy is failing to start due to failure to set nf_conntrack_max.
I looked into this a bit and found a similar problem in kind and minikube, and it seems to be fixed in the following Issue and PR.
for kind https://github.com/kubernetes-sigs/kind/issues/2240 https://github.com/kubernetes-sigs/kind/pull/2241
for minikube https://github.com/kubernetes/minikube/pull/11419
Screenshots or terminal output
k3d command cli log
docker ps
docker logs --defail -t f9e815595dcf
Which OS & Architecture
Which version of
k3d
k3d version
Which version of docker
docker version
docker info