projectcalico / calico

Cloud native networking and network security
https://docs.tigera.io/calico/latest/about/
Apache License 2.0
6.05k stars 1.34k forks source link

Calico-kube-controller CrashLoopBackOff #9522

Closed sunminming closed 5 days ago

sunminming commented 5 days ago

Calico-kube-controller pod and calico-node pod keep CrashLoopBackOff ,About every 5 minutes, and exit code is 2

root@k8s-10-10-40-34:/home/sunminming# kubectl -n kube-system get pod
NAME                                         READY   STATUS             RESTARTS        AGE
calico-kube-controllers-6946cb87d6-nkzdc     0/1     CrashLoopBackOff   4 (83s ago)     26m
calico-node-ccc78                            1/1     Running            1 (3m7s ago)    4m46s
coredns-c5768dcc7-wj5rs                      1/1     Running            2 (2m18s ago)   28m
dashboard-metrics-scraper-69b9b44766-tm7pq   0/1     CrashLoopBackOff   2 (27s ago)     28m
kubernetes-dashboard-7df74bff86-8mzbq        0/1     CrashLoopBackOff   2 (22s ago)     28m
metrics-server-65b5b555f5-t5d5f              0/1     Running            4 (57s ago)     28m
node-local-dns-9mtrc                         0/1     CrashLoopBackOff   5 (22s ago)     28m
root@k8s-10-10-40-34:/home/sunminming# kubectl -n kube-system logs calico-kube-controllers-6946cb87d6-nkzdc
2024-11-23 17:26:37.483 [INFO][1] main.go 107: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"etcdv3"}
W1123 17:26:37.486472       1 client_config.go:618] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2024-11-23 17:26:37.487 [INFO][1] main.go 131: Ensuring Calico datastore is initialized
2024-11-23 17:26:37.497 [INFO][1] main.go 157: Calico datastore is initialized
2024-11-23 17:26:37.498 [INFO][1] main.go 194: Getting initial config snapshot from datastore
2024-11-23 17:26:37.503 [INFO][1] resources.go 350: Main client watcher loop
2024-11-23 17:26:37.503 [INFO][1] main.go 197: Got initial config snapshot
2024-11-23 17:26:37.504 [INFO][1] watchersyncer.go 89: Start called
2024-11-23 17:26:37.504 [INFO][1] main.go 211: Starting status report routine
2024-11-23 17:26:37.504 [INFO][1] main.go 220: Starting Prometheus metrics server on port 9094
2024-11-23 17:26:37.504 [INFO][1] main.go 503: Starting informer informer=&cache.sharedIndexInformer{indexer:(*cache.cache)(0xc0001ab398), controller:cache.Controller(nil), processor:(*cache.sharedProcessor)(0xc0004b3c70), cacheMutationDetector:cache.dummyMutationDetector{}, listerWatcher:(*cache.ListWatch)(0xc0001ab380), objectType:(*v1.Pod)(0xc0002a6480), resyncCheckPeriod:0, defaultEventHandlerResyncPeriod:0, clock:(*clock.RealClock)(0x31379c0), started:false, stopped:false, startedLock:sync.Mutex{state:0, sema:0x0}, blockDeltas:sync.Mutex{state:0, sema:0x0}, watchErrorHandler:(cache.WatchErrorHandler)(nil), transform:(cache.TransformFunc)(nil)}
2024-11-23 17:26:37.504 [INFO][1] main.go 503: Starting informer informer=&cache.sharedIndexInformer{indexer:(*cache.cache)(0xc0001ab3e0), controller:cache.Controller(nil), processor:(*cache.sharedProcessor)(0xc0004b3cc0), cacheMutationDetector:cache.dummyMutationDetector{}, listerWatcher:(*cache.ListWatch)(0xc0001ab3c8), objectType:(*v1.Node)(0xc0005b6b00), resyncCheckPeriod:0, defaultEventHandlerResyncPeriod:0, clock:(*clock.RealClock)(0x31379c0), started:false, stopped:false, startedLock:sync.Mutex{state:0, sema:0x0}, blockDeltas:sync.Mutex{state:0, sema:0x0}, watchErrorHandler:(cache.WatchErrorHandler)(nil), transform:(cache.TransformFunc)(nil)}
2024-11-23 17:26:37.504 [INFO][1] main.go 509: Starting controller ControllerType="Pod"
2024-11-23 17:26:37.504 [INFO][1] main.go 509: Starting controller ControllerType="Namespace"
2024-11-23 17:26:37.504 [INFO][1] main.go 509: Starting controller ControllerType="NetworkPolicy"
2024-11-23 17:26:37.504 [INFO][1] main.go 509: Starting controller ControllerType="Node"
2024-11-23 17:26:37.504 [INFO][1] main.go 509: Starting controller ControllerType="ServiceAccount"
2024-11-23 17:26:37.504 [INFO][1] serviceaccount_controller.go 152: Starting ServiceAccount/Profile controller
I1123 17:26:37.504736       1 shared_informer.go:270] Waiting for caches to sync for service-accounts
2024-11-23 17:26:37.504 [INFO][1] watchersyncer.go 130: Sending status update Status=wait-for-ready
2024-11-23 17:26:37.504 [INFO][1] syncer.go 86: Node controller syncer status updated: wait-for-ready
2024-11-23 17:26:37.505 [INFO][1] watchersyncer.go 149: Starting main event processing loop
2024-11-23 17:26:37.505 [INFO][1] watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/ippools"
2024-11-23 17:26:37.505 [INFO][1] controller.go 158: Starting Node controller
I1123 17:26:37.505420       1 shared_informer.go:270] Waiting for caches to sync for nodes
2024-11-23 17:26:37.506 [INFO][1] watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/nodes"
2024-11-23 17:26:37.506 [INFO][1] watchercache.go 181: Full resync is required ListRoot="/calico/ipam/v2/assignment/"
2024-11-23 17:26:37.506 [INFO][1] watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/clusterinformations"
2024-11-23 17:26:37.506 [INFO][1] main.go 346: Starting periodic etcdv3 compaction period=10m0s
2024-11-23 17:26:37.507 [INFO][1] pod_controller.go 229: Starting Pod/WorkloadEndpoint controller
2024-11-23 17:26:37.507 [INFO][1] namespace_controller.go 158: Starting Namespace/Profile controller
I1123 17:26:37.507407       1 shared_informer.go:270] Waiting for caches to sync for namespaces
2024-11-23 17:26:37.507 [INFO][1] policy_controller.go 149: Starting NetworkPolicy controller
I1123 17:26:37.507577       1 shared_informer.go:270] Waiting for caches to sync for network-policies
2024-11-23 17:26:37.512 [INFO][1] watchercache.go 294: Sending synced update ListRoot="/calico/resources/v3/projectcalico.org/clusterinformations"
2024-11-23 17:26:37.512 [INFO][1] watchersyncer.go 130: Sending status update Status=resync
2024-11-23 17:26:37.513 [INFO][1] watchercache.go 294: Sending synced update ListRoot="/calico/resources/v3/projectcalico.org/nodes"
2024-11-23 17:26:37.514 [INFO][1] watchercache.go 294: Sending synced update ListRoot="/calico/ipam/v2/assignment/"
2024-11-23 17:26:37.516 [INFO][1] syncer.go 86: Node controller syncer status updated: resync
2024-11-23 17:26:37.516 [INFO][1] watchersyncer.go 209: Received InSync event from one of the watcher caches
2024-11-23 17:26:37.517 [INFO][1] watchersyncer.go 209: Received InSync event from one of the watcher caches
2024-11-23 17:26:37.517 [INFO][1] watchersyncer.go 209: Received InSync event from one of the watcher caches
2024-11-23 17:26:37.517 [WARNING][1] labels.go 85: Unexpected kind received over syncer: ClusterInformation(default)
2024-11-23 17:26:37.522 [INFO][1] watchercache.go 294: Sending synced update ListRoot="/calico/resources/v3/projectcalico.org/ippools"
2024-11-23 17:26:37.522 [WARNING][1] labels.go 85: Unexpected kind received over syncer: IPPool(default-ipv4-ippool)
2024-11-23 17:26:37.522 [INFO][1] watchersyncer.go 209: Received InSync event from one of the watcher caches
2024-11-23 17:26:37.522 [INFO][1] watchersyncer.go 221: All watchers have sync'd data - sending data and final sync
2024-11-23 17:26:37.522 [INFO][1] watchersyncer.go 130: Sending status update Status=in-sync
2024-11-23 17:26:37.523 [INFO][1] syncer.go 86: Node controller syncer status updated: in-sync
I1123 17:26:37.525659       1 shared_informer.go:270] Waiting for caches to sync for pods
2024-11-23 17:26:37.534 [INFO][1] hostendpoints.go 173: successfully synced all hostendpoints
I1123 17:26:37.605520       1 shared_informer.go:277] Caches are synced for service-accounts
2024-11-23 17:26:37.605 [INFO][1] serviceaccount_controller.go 170: ServiceAccount/Profile controller is now running
I1123 17:26:37.605554       1 shared_informer.go:277] Caches are synced for nodes
I1123 17:26:37.605826       1 shared_informer.go:270] Waiting for caches to sync for pods
I1123 17:26:37.605876       1 shared_informer.go:277] Caches are synced for pods
2024-11-23 17:26:37.605 [INFO][1] ipam.go 241: Will run periodic IPAM sync every 7m30s
2024-11-23 17:26:37.606 [INFO][1] ipam.go 305: Syncer is InSync, kicking sync channel status=in-sync
I1123 17:26:37.608288       1 shared_informer.go:277] Caches are synced for namespaces
2024-11-23 17:26:37.608 [INFO][1] namespace_controller.go 176: Namespace/Profile controller is now running
I1123 17:26:37.608595       1 shared_informer.go:277] Caches are synced for network-policies
2024-11-23 17:26:37.608 [INFO][1] policy_controller.go 171: NetworkPolicy controller is now running
I1123 17:26:37.626623       1 shared_informer.go:277] Caches are synced for pods
2024-11-23 17:26:37.626 [INFO][1] pod_controller.go 253: Pod/WorkloadEndpoint controller is now running
2024-11-23 17:27:00.261 [INFO][1] ipam_allocation.go 175: Candidate IP leak handle="k8s-pod-network.f819de1078da4e20ecb965aa36e252b275b73d15772c6af4020c179a13664853" ip="172.20.71.9" node="k8s-10-10-40-34" pod="kube-system/coredns-c5768dcc7-wj5rs"
2024-11-23 17:27:05.695 [INFO][1] ipam_allocation.go 203: Confirmed valid IP after 5.433512087s handle="k8s-pod-network.f819de1078da4e20ecb965aa36e252b275b73d15772c6af4020c179a13664853" ip="172.20.71.9" node="k8s-10-10-40-34" pod="kube-system/coredns-c5768dcc7-wj5rs"
2024-11-23 17:27:07.364 [INFO][1] ipam_allocation.go 175: Candidate IP leak handle="k8s-pod-network.ab4e67838962ffcb4fd76d2b82ca6b27b26398246ee0ca3a437b1673e4f929c6" ip="172.20.71.10" node="k8s-10-10-40-34" pod="kube-system/dashboard-metrics-scraper-69b9b44766-tm7pq"
2024-11-23 17:27:17.670 [INFO][1] ipam_allocation.go 203: Confirmed valid IP after 10.306253217s handle="k8s-pod-network.ab4e67838962ffcb4fd76d2b82ca6b27b26398246ee0ca3a437b1673e4f929c6" ip="172.20.71.10" node="k8s-10-10-40-34" pod="kube-system/dashboard-metrics-scraper-69b9b44766-tm7pq"
2024-11-23 17:27:18.369 [INFO][1] ipam_allocation.go 175: Candidate IP leak handle="k8s-pod-network.7225e2668d1848bea9c5edb76b7790eb8360db93423a92bc5d74a173f770cec2" ip="172.20.71.11" node="k8s-10-10-40-34" pod="kube-system/kubernetes-dashboard-7df74bff86-8mzbq"
2024-11-23 17:27:37.716 [INFO][1] ipam_allocation.go 203: Confirmed valid IP after 19.347357655s handle="k8s-pod-network.7225e2668d1848bea9c5edb76b7790eb8360db93423a92bc5d74a173f770cec2" ip="172.20.71.11" node="k8s-10-10-40-34" pod="kube-system/kubernetes-dashboard-7df74bff86-8mzbq"
2024-11-23 17:27:38.375 [INFO][1] ipam_allocation.go 175: Candidate IP leak handle="k8s-pod-network.50cdccfd31bbb20e4fc8ba05bace49e5d9b1332a444e2b9e17a1d19e7c0b608a" ip="172.20.71.12" node="k8s-10-10-40-34" pod="kube-system/metrics-server-65b5b555f5-t5d5f"

Expected Behavior

Keep Running

Current Behavior

keep CrashLoopBackOff

Possible Solution

Steps to Reproduce (for bugs)

1. 2. 3. 4.

Context

Your Environment