Closed igolka97 closed 2 months ago
Could you describe the events of calico-kube-controllers-5fd7f74c8d-8smqc
?
Thanks for your reply. Output is here
root@node-1:~# kubectl describe pod -n calico-system calico-kube-controllers-5fd7f74c8d-8smqc
Name: calico-kube-controllers-5fd7f74c8d-8smqc
Namespace: calico-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Service Account: calico-kube-controllers
Node: <none>
Labels: app.kubernetes.io/name=calico-kube-controllers
k8s-app=calico-kube-controllers
pod-template-hash=5fd7f74c8d
Annotations: hash.operator.tigera.io/system: fdde45054a8ae4f629960ce37570929502e59449
tigera-operator.hash.operator.tigera.io/tigera-ca-private: 29444b4059d0cf3605da1bc4d3d0d5ee97cbbbce
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/calico-kube-controllers-5fd7f74c8d
Containers:
calico-kube-controllers:
Image: docker.io/calico/kube-controllers:v3.27.3
Port: <none>
Host Port: <none>
SeccompProfile: RuntimeDefault
Liveness: exec [/usr/bin/check-status -l] delay=10s timeout=10s period=60s #success=1 #failure=6
Readiness: exec [/usr/bin/check-status -r] delay=0s timeout=10s period=30s #success=1 #failure=3
Environment:
KUBE_CONTROLLERS_CONFIG_NAME: default
DATASTORE_TYPE: kubernetes
ENABLED_CONTROLLERS: node
FIPS_MODE_ENABLED: false
KUBERNETES_SERVICE_HOST: 10.96.0.1
KUBERNETES_SERVICE_PORT: 443
CA_CRT_PATH: /etc/pki/tls/certs/tigera-ca-bundle.crt
Mounts:
/etc/pki/tls/cert.pem from tigera-ca-bundle (ro,path="ca-bundle.crt")
/etc/pki/tls/certs from tigera-ca-bundle (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-nhslt (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
tigera-ca-bundle:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: tigera-ca-bundle
Optional: false
kube-api-access-nhslt:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly op=Exists
node-role.kubernetes.io/control-plane:NoSchedule
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 3m3s (x1121 over 3d21h) default-scheduler 0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
It looks like all your k8s nodes are not ready, Could you run the following commands to collect info?
kubectl describe nodes
journalctl -u kubelet
As I already wrote, I am trying to setup single-node cluster, and yes, it stays in NetworkReady=false state until containerd restart
root@node-1:~# kubectl describe nodes
Name: node-1
Roles: control-plane
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=node-1
kubernetes.io/os=linux
node-role.kubernetes.io/control-plane=
node.kubernetes.io/exclude-from-external-load-balancers=
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
node.alpha.kubernetes.io/ttl: 0
projectcalico.org/IPv4Address: 192.168.0.29/24
projectcalico.org/IPv4VXLANTunnelAddr: 10.244.84.128
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Fri, 05 Apr 2024 00:03:14 +0300
Taints: node-role.kubernetes.io/control-plane:NoSchedule
node.kubernetes.io/not-ready:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: node-1
AcquireTime: <unset>
RenewTime: Wed, 10 Apr 2024 02:37:11 +0300
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Fri, 05 Apr 2024 00:20:17 +0300 Fri, 05 Apr 2024 00:20:17 +0300 CalicoIsUp Calico is running on this node
MemoryPressure False Wed, 10 Apr 2024 02:35:10 +0300 Fri, 05 Apr 2024 00:03:13 +0300 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Wed, 10 Apr 2024 02:35:10 +0300 Fri, 05 Apr 2024 00:03:13 +0300 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Wed, 10 Apr 2024 02:35:10 +0300 Fri, 05 Apr 2024 00:03:13 +0300 KubeletHasSufficientPID kubelet has sufficient PID available
Ready False Wed, 10 Apr 2024 02:35:10 +0300 Fri, 05 Apr 2024 00:03:13 +0300 KubeletNotReady container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
Addresses:
InternalIP: 192.168.0.29
Hostname: node-1
Capacity:
cpu: 8
ephemeral-storage: 103107780Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 12243472Ki
pods: 110
Allocatable:
cpu: 8
ephemeral-storage: 95024129891
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 12141072Ki
pods: 110
System Info:
Machine ID: 50d84afc3dd943f4b7fa5e195a474836
System UUID: 3d922ae8-4f97-4da8-b53a-9645d40f7423
Boot ID: 620b6978-3e90-4ce4-aa38-99740a5efb06
Kernel Version: 5.15.0-101-generic
OS Image: Ubuntu 22.04.4 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: containerd://1.6.28
Kubelet Version: v1.29.3
Kube-Proxy Version: v1.29.3
PodCIDR: 10.244.0.0/24
PodCIDRs: 10.244.0.0/24
Non-terminated Pods: (9 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
calico-system calico-node-vqkr5 0 (0%) 0 (0%) 0 (0%) 0 (0%) 5d2h
calico-system calico-typha-5bb76c895c-d58sd 0 (0%) 0 (0%) 0 (0%) 0 (0%) 5d2h
calico-system csi-node-driver-rwhn6 0 (0%) 0 (0%) 0 (0%) 0 (0%) 5d2h
kube-system etcd-node-1 100m (1%) 0 (0%) 100Mi (0%) 0 (0%) 5d2h
kube-system kube-apiserver-node-1 250m (3%) 0 (0%) 0 (0%) 0 (0%) 5d2h
kube-system kube-controller-manager-node-1 200m (2%) 0 (0%) 0 (0%) 0 (0%) 5d2h
kube-system kube-proxy-xdx5l 0 (0%) 0 (0%) 0 (0%) 0 (0%) 5d2h
kube-system kube-scheduler-node-1 100m (1%) 0 (0%) 0 (0%) 0 (0%) 5d2h
tigera-operator tigera-operator-6bfc79cb9c-mgz58 0 (0%) 0 (0%) 0 (0%) 0 (0%) 5d2h
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 650m (8%) 0 (0%)
memory 100Mi (0%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events: <none>
last rows, another is the same:
journalctl -u kubelet
Apr 10 02:42:07 node-1 kubelet[288664]: E0410 02:42:07.320368 288664 kubelet.go:2892] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
Apr 10 02:42:07 node-1 kubelet[288664]: E0410 02:42:07.402872 288664 pod_workers.go:1298] "Error syncing pod, skipping" err="network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized" pod="calico-system/csi-node-driver-rwhn6" podUID="f4698a19-4b45-4116-af82-210094037ee2"
Apr 10 02:42:09 node-1 kubelet[288664]: E0410 02:42:09.403291 288664 pod_workers.go:1298] "Error syncing pod, skipping" err="network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized" pod="calico-system/csi-node-driver-rwhn6" podUID="f4698a19-4b45-4116-af82-210094037ee2"
Apr 10 02:42:11 node-1 kubelet[288664]: E0410 02:42:11.403114 288664 pod_workers.go:1298] "Error syncing pod, skipping" err="network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized" pod="calico-system/csi-node-driver-rwhn6" podUID="f4698a19-4b45-4116-af82-210094037ee2"
~
I think this issue is not related to calico, it is more like a contained issue, contaienrd can't find the cni dir, you should restart the contaienrd
@cyclinder how did you understand it? I will try to go deeper with this point
I've hit this issue before, I think it's not related to calico. I found that containerd's logs report: "No CNI conf file found", but calico is already producing its CNI files normally in /etc/cni/net.d
, and then after I restarted containerd, everything is fine, so I suspect that it's containerd that can't dynamically discover files in the cni directory but that's just a guess.
In my last attempt, I didn't delete the net.d folder after kubeadm reset. I also rebooted containerd just in case before initializing the new cluster.
I conclude that the containerd process loses the net.d folder if it is deleted and recreated until it is rebooted.
I wanted to understand this issue in order to better understand the internal processes that take place behind the scenes of the k8s system. I hope I made the right conclusion.
Thank you in any case, I will be grateful for any comment. Perhaps this observation can help someone else.
When the calico-node pod started on fresh install, the node stuck in the NetworkReady=false state.
Expected Behavior
When the Calico-node pod started, the node becomes in "ready" state
Current Behavior
I have initited a new Kubernetes single-node cluster using kubeadm. Installed tigera-operator as well as Calico by following this guide
When the calico-node pod started, the node was stuck in the NetworkReady=false state.
Possible Solution
After several attempts to find some solution, I restarted the containerd service and then everything started working.
I got exactly the same behavior when I completely reset the node and initialized it again. And I get the same result over and over again.
Context
I'm trying to startup cluster on on-premise infrasturcture with automation tools
Your Environment
Please let me know if I have to provide any additional info btw I tried to reproduce this situation in minikube with 1.28 k8s ver and everything goes well there