Closed R00tedSec closed 5 months ago
Kubelet is complaining because it can't find calico binary. What cni did you choose? Can you check if the CNI agent pod is running on that node?
Sure, the pods are running correctly
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
rke2-canal-476ws 2/2 Running 0 58m 172.27.1.101 rke2-lab-worker-1 <none> <none>
rke2-canal-g46sf 2/2 Running 0 24h 172.27.1.100 rke2-lab-master-1 <none> <none>
rke2-canal-nm2jh 2/2 Running 0 58m 172.27.1.102 rke2-lab-worker-2 <none> <none>
rke2-coredns-rke2-coredns-75c8f68666-rz67j 0/1 ContainerCreating 0 52m <none> rke2-lab-worker-2 <none> <none>
rke2-coredns-rke2-coredns-75c8f68666-zxkmt 0/1 ContainerCreating 0 52m <none> rke2-lab-worker-1 <none> <none>
rke2-coredns-rke2-coredns-7bdc89bfd7-889qb 1/1 Running 0 6h2m 10.42.0.102 rke2-lab-master-1 <none> <none>
Also i've checked the logs, in rke2-canal-nm2jh nothing seems to be failing , i can share with you logs , but this is what appears as WARNING
2024-05-09 14:51:29.611 [WARNING][1] cni-installer/<nil> <nil>: Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
2024-05-09 14:51:29.624 [WARNING][1] cni-installer/<nil> <nil>: Failed to remove 10-calico.conflist error=remove /host/etc/cni/net.d/10-calico.conflist: no such file or directory
2024-05-09 14:51:31.819 [WARNING][9] startup/winutils.go 144: Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
2024-05-09 14:51:32.971 [WARNING][47] cni-config-monitor/winutils.go 144: Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
2024-05-09 14:51:32.976 [WARNING][47] cni-config-monitor/winutils.go 144: Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
2024-05-09 14:51:33.075 [WARNING][48] felix/int_dataplane.go 553: Failed to cleanup preexisting XDP state error=cannot find XDP object "/usr/lib/calico/bpf/filter.o"
This are the cni folders in the worker nodes
system@rke2-lab-worker-2:/etc# tree /opt/cni/
/opt/cni/
└── bin
├── bandwidth
├── bridge
├── calico
├── calico-ipam
├── dhcp
├── dummy
├── firewall
├── flannel
├── host-device
├── host-local
├── ipvlan
├── loopback
├── macvlan
├── portmap
├── ptp
├── sbr
├── static
├── tap
├── tuning
├── vlan
└── vrf
2 directories, 21 files
system@rke2-lab-worker-2:/etc# tree /etc/cni/
/etc/cni/
└── net.d
├── 10-canal.conflist
└── calico-kubeconfig
2 directories, 2 files
Events
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 14m default-scheduler 0/3 nodes are available: 3 node(s) didn't match pod anti-affinity rules. preemption: 0/3 nodes are available: 3 node(s) didn't match pod anti-affinity rules..
Warning FailedScheduling 9m22s (x6 over 14m) default-scheduler 0/3 nodes are available: 3 node(s) didn't match pod anti-affinity rules. preemption: 0/3 nodes are available: 3 node(s) didn't match pod anti-affinity rules..
Normal Scheduled 9m3s default-scheduler Successfully assigned kube-system/rke2-coredns-rke2-coredns-75c8f68666-gbp7m to rke2-soc-lab-worker-2
Warning FailedCreatePodSandBox 8m16s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "3403f13338157694d8bb11315b5418d059da1261acf117a6f1b969ae38968022": plugin type="calico" failed (add): unexpected error when reading response body. Please retry. Original error: http2: client connection lost
Warning FailedCreatePodSandBox 7m30s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "806b36f566ef24e206c8136ea871f713230d28da94fc6c221e8f576806fc9738": plugin type="calico" failed (add): unexpected error when reading response body. Please retry. Original error: http2: client connection lost
Warning FailedCreatePodSandBox 6m44s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "19d02e29bbc4022c4cf9c28e68cf69695f1b2e2411a9ba696a438e38740f8f9e": plugin type="calico" failed (add): unexpected error when reading response body. Please retry. Original error: http2: client connection lost
Warning FailedCreatePodSandBox 5m58s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "841d8216b7a514a1285098f2ee9fbb1326ef3456c858fd3a8149601daa55a1f2": plugin type="calico" failed (add): unexpected error when reading response body. Please retry. Original error: http2: client connection lost
Warning FailedCreatePodSandBox 5m12s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "1c3f063f83e0364a54f1003b2e4243d412b383928efd5377a0900eee4f9bf9c1": plugin type="calico" failed (add): unexpected error when reading response body. Please retry. Original error: http2: client connection lost
Warning FailedCreatePodSandBox 4m26s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "b1a2e3303d0a9d22b6e31e00832cc1ef5be7e5eeca89018cd1274b19b9634aee": plugin type="calico" failed (add): unexpected error when reading response body. Please retry. Original error: http2: client connection lost
Warning FailedCreatePodSandBox 3m40s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "266b2da113156e25ae01b2d6659e5d95e4148c8b3d50c4c6d5d21cb238f70f99": plugin type="calico" failed (add): unexpected error when reading response body. Please retry. Original error: http2: client connection lost
Warning FailedCreatePodSandBox 2m54s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "a99eecf2e365b73fc866396d3cf10c79e9c18265987339e89e4d9f5cb23ed43f": plugin type="calico" failed (add): unexpected error when reading response body. Please retry. Original error: http2: client connection lost
Warning FailedCreatePodSandBox 2m8s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "4c7a219cbacc562c1563b7cb5a40a777f3bf06d082bfb16ef2ba2761487157f8": plugin type="calico" failed (add): unexpected error when reading response body. Please retry. Original error: http2: client connection lost
Warning FailedCreatePodSandBox <invalid> (x6 over 82s) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "25a4dd055148ae6e0580e8f37506dc6415b69ef1d5f7af13ac72d4965ea670a6": plugin type="calico" failed (add): unexpected error when reading response body. Please retry. Original error: http2: client connection lost
Sorry, I read the issue too quickly. It can indeed find the calico binary but calico is throwing the error unexpected error when reading response body. Please retry. Original error: http2: client connection lost
Make sure that you've disabled any host firewalls (firewalld/ufw) or other endpoint protection products. It sounds like something is blocking the connection between calico and the apiserver.
There are no host firewalls or other forms of endpoint protection on the host. However, I discovered a related issue involving the RHEL cloud provider and NetworkManager. Given that I'm utilizing the Debian cloud init image with NetworkManager, could something related to this be causing interference?
I've already applied the workaround proposed, and still does not work
Check the containerd.log to see if there are any additional error messages? Also confirm that kube-proxy is running on this node?
Kube-proxy is up and running
kubectl get pods -A | grep kube-proxy
kube-system kube-proxy-rke-lab-master-1 1/1 Running 0 36m
kube-system kube-proxy-rke-lab-worker-1 1/1 Running 0 34m
kube-system kube-proxy-rke-lab-worker-2 1/1 Running 0 34m
There is not much more info in containerd.log
cat /var/lib/rancher/rke2/agent/containerd/containerd.log | grep rke2-coredns-rke2-coredns-84b9cb946c-69rb2
time="2024-05-13T08:17:17.696520957Z" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:rke2-coredns-rke2-coredns-84b9cb946c-69rb2,Uid:70559555-eed4-437f-a8c9-91629a5fe412,Namespace:kube-system,Attempt:0,}"
time="2024-05-13T08:18:02.854158772Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:rke2-coredns-rke2-coredns-84b9cb946c-69rb2,Uid:70559555-eed4-437f-a8c9-91629a5fe412,Namespace:kube-system,Attempt:0,} failed, error" error="failed to setup network for sandbox \"bace4a8a62b1bbcce5cae41696ced3f5d0ad952f5a5da5e3606c6fb1552dbf49\": plugin type=\"calico\" failed (add): unexpected error when reading response body. Please retry. Original error: http2: client connection lost"
time="2024-05-13T08:18:03.174299321Z" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:rke2-coredns-rke2-coredns-84b9cb946c-69rb2,Uid:70559555-eed4-437f-a8c9-91629a5fe412,Namespace:kube-system,Attempt:0,}"
time="2024-05-13T08:18:48.326710775Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:rke2-coredns-rke2-coredns-84b9cb946c-69rb2,Uid:70559555-eed4-437f-a8c9-91629a5fe412,Namespace:kube-system,Attempt:0,} failed, error" error="failed to setup network for sandbox \"09daf9489e7ddfb880efd83c67cd75649c017e21b309c957fcc6092568928c3c\": plugin type=\"calico\" failed (add): unexpected error when reading response body. Please retry. Original error: http2: client connection lost"
time="2024-05-13T08:18:49.281318025Z" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:rke2-coredns-rke2-coredns-84b9cb946c-69rb2,Uid:70559555-eed4-437f-a8c9-91629a5fe412,Namespace:kube-system,Attempt:0,}"
time="2024-05-13T08:19:34.449113980Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:rke2-coredns-rke2-coredns-84b9cb946c-69rb2,Uid:70559555-eed4-437f-a8c9-91629a5fe412,Namespace:kube-system,Attempt:0,} failed, error" error="failed to setup network for sandbox \"e559cd966f4f22174dc54bdef6839ac13a33d50deb3a5fc881debdcc2467a653\": plugin type=\"calico\" failed (add): unexpected error when reading response body. Please retry. Original error: http2: client connection lost"
time="2024-05-13T08:19:35.373561400Z" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:rke2-coredns-rke2-coredns-84b9cb946c-69rb2,Uid:70559555-eed4-437f-a8c9-91629a5fe412,Namespace:kube-system,Attempt:0,}"
time="2024-05-13T08:20:20.537206475Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:rke2-coredns-rke2-coredns-84b9cb946c-69rb2,Uid:70559555-eed4-437f-a8c9-91629a5fe412,Namespace:kube-system,Attempt:0,} failed, error" error="failed to setup network for sandbox \"8592e36dcc717f193f5636a8d2aa02ee0dd358c0d4cf49c3c3a7ed65a18475f9\": plugin type=\"calico\" failed (add): unexpected error when reading response body. Please retry. Original error: http2: client connection lost"
time="2024-05-13T08:20:21.467084624Z" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:rke2-coredns-rke2-coredns-84b9cb946c-69rb2,Uid:70559555-eed4-437f-a8c9-91629a5fe412,Namespace:kube-system,Attempt:0,}"
Kube proxy also indicates communication failure.
kubectl logs kube-proxy-rke-lab-worker-2 -n kube-system
I0513 07:52:58.875361 1 node.go:141] Successfully retrieved node IP: 172.27.1.102
I0513 07:52:58.893030 1 server.go:632] "kube-proxy running in dual-stack mode" primary ipFamily="IPv4"
I0513 07:52:58.893814 1 server_others.go:152] "Using iptables Proxier"
I0513 07:52:58.893832 1 server_others.go:421] "Detect-local-mode set to ClusterCIDR, but no cluster CIDR for family" ipFamily="IPv6"
I0513 07:52:58.893836 1 server_others.go:438] "Defaulting to no-op detect-local"
I0513 07:52:58.893854 1 proxier.go:250] "Setting route_localnet=1 to allow node-ports on localhost; to change this either disable iptables.localhostNodePorts (--iptables-localhost-nodeports) or set nodePortAddresses (--nodeport-addresses) to filter loopback addresses"
I0513 07:52:58.893991 1 server.go:846] "Version info" version="v1.28.9+rke2r1"
I0513 07:52:58.894000 1 server.go:848] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
I0513 07:52:58.894455 1 config.go:97] "Starting endpoint slice config controller"
I0513 07:52:58.894467 1 config.go:315] "Starting node config controller"
I0513 07:52:58.894474 1 shared_informer.go:311] Waiting for caches to sync for node config
I0513 07:52:58.894467 1 shared_informer.go:311] Waiting for caches to sync for endpoint slice config
I0513 07:52:58.894577 1 config.go:188] "Starting service config controller"
I0513 07:52:58.894583 1 shared_informer.go:311] Waiting for caches to sync for service config
I0513 07:52:58.994719 1 shared_informer.go:318] Caches are synced for endpoint slice config
I0513 07:52:58.994727 1 shared_informer.go:318] Caches are synced for service config
I0513 07:52:58.994753 1 shared_informer.go:318] Caches are synced for node config
W0513 07:54:59.982396 1 reflector.go:458] k8s.io/client-go/informers/factory.go:150: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0513 07:54:59.982440 1 reflector.go:458] k8s.io/client-go/informers/factory.go:150: watch of *v1.Service ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0513 07:54:59.982477 1 reflector.go:458] k8s.io/client-go/informers/factory.go:150: watch of *v1.EndpointSlice ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0513 08:06:31.655352 1 reflector.go:458] k8s.io/client-go/informers/factory.go:150: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0513 08:06:31.655396 1 reflector.go:458] k8s.io/client-go/informers/factory.go:150: watch of *v1.EndpointSlice ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0513 08:06:31.655412 1 reflector.go:458] k8s.io/client-go/informers/factory.go:150: watch of *v1.Service ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0513 08:13:41.050945 1 reflector.go:458] k8s.io/client-go/informers/factory.go:150: watch of *v1.Service ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0513 08:13:41.050944 1 reflector.go:458] k8s.io/client-go/informers/factory.go:150: watch of *v1.EndpointSlice ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0513 08:13:41.050968 1 reflector.go:458] k8s.io/client-go/informers/factory.go:150: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0513 08:15:29.352391 1 reflector.go:458] k8s.io/client-go/informers/factory.go:150: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0513 08:15:29.352399 1 reflector.go:458] k8s.io/client-go/informers/factory.go:150: watch of *v1.EndpointSlice ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0513 08:15:29.352397 1 reflector.go:458] k8s.io/client-go/informers/factory.go:150: watch of *v1.Service ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
Something on your node is blocking communication. Please investigate what that might be.
I just went through something similar and it turned out to be corrupted vxlan packets (bad udp cksum). For my case it was flannel, but I did see some other folks have issues with calico as well. Symptoms pointed to OS firewall or kube-proxy.. but in the end, they were actually okay. Related to vms running on vmware.
Check out these discussions:
https://github.com/projectcalico/calico/issues/3145
https://github.com/flannel-io/flannel/blob/master/Documentation/troubleshooting.md
The flannel discussion specifies how to make the change persistent via udev rules.
Thanks for the info, @burlyunixguy . That seems to be the problem, this cluster is being executed in VMs deployed in a SDN VXLAN in proxmox. But this morning, I was able to spin up a cluster with SNAT enabled in Proxmox, and everything started okay.
I'm not sure if it's exactly related to the discussion you mentioned.
It turns out the problem was the MTU setting of the VXLAN. You can check out more details here: MTU Considerations for VXLAN.
Sorry for the misunderstanding!
Environmental Info: RKE2 Version: rke2 version v1.28.9+rke2r1 (07bf87f9118c1386fa73f660142cc28b5bef1886) go version go1.21.9 X:boringcrypto
Node(s) CPU architecture, OS, and Version: Linux rke2-lab-master-1 6.1.0-20-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.85-1 (2024-04-11) x86_64 GNU/Linux
Cluster Configuration:
Describe the bug: When attempting to deploy the CoreDNS pod on agents nodes after successfully creating an RKE2 cluster.
CoreDNS pods in the agents are stuck on ContainerCreating with the following error
Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "e5580fe9275de080781b229bcb82c0b7dd05af5e2d9b5366af2c978f94ec842d": plugin type="calico" failed (add): unexpected error when reading response body. Please retry. Original error: http2: client connection lost
Steps To Reproduce:
Expected behavior: CoreDNS Containers should start as expected and provide DNS service to the pods on the agent nodes.
Actual behavior: CoreDNS pods on the agent nodes remain stuck on "ContainerCreating" state, with the error message mentioned above.
Additional context / logs: No further logs were found. Both Hardened-Calico and Hardened-Flannel containers appear to be functioning correctly.