projectcalico / calico

Cloud native networking and network security
https://docs.tigera.io/calico/latest/about/
Apache License 2.0
6.03k stars 1.34k forks source link

calico-kube-controllers panic (calico 3.26.1) #7972

Open mchtech opened 1 year ago

mchtech commented 1 year ago

Expected Behavior

Current Behavior

crash: concurrent map iteration and map write

Possible Solution

Steps to Reproduce (for bugs)

  1. rollout restart deployment

Context

fatal error: concurrent map iteration and map write

goroutine 371 [running]:
reflect.mapiternext(0x4cff2f?)
    /usr/local/go/src/runtime/map.go:1380 +0x19
reflect.(*MapIter).Next(0xc001a708e8?)
    /usr/local/go/src/reflect/value.go:1924 +0x7e
internal/fmtsort.Sort({0x1c44800?, 0xc01610cbe0?, 0xc0005a42e8?})
    /usr/local/go/src/internal/fmtsort/sort.go:62 +0x1f0
fmt.(*pp).printValue(0xc0073bc340, {0x1c44800?, 0xc01610cbe0?, 0x0?}, 0x76, 0x1)
    /usr/local/go/src/fmt/print.go:816 +0x986
fmt.(*pp).printValue(0xc0073bc340, {0x1db3800?, 0xc01610cbc0?, 0x30?}, 0x76, 0x0)
    /usr/local/go/src/fmt/print.go:853 +0x120a
fmt.(*pp).printArg(0xc0073bc340, {0x1db3800?, 0xc01610cbc0}, 0x76)
    /usr/local/go/src/fmt/print.go:759 +0x756
fmt.(*pp).doPrintf(0xc0073bc340, {0x1facf6b, 0x3d}, {0xc001a71108?, 0x2, 0x2})
    /usr/local/go/src/fmt/print.go:1077 +0x387
fmt.Sprintf({0x1facf6b, 0x3d}, {0xc001a71108, 0x2, 0x2})
    /usr/local/go/src/fmt/print.go:239 +0x59
github.com/sirupsen/logrus.(*Entry).Logf(0xc01660e000, 0x4, {0x1facf6b?, 0x1db3800?}, {0xc001a71108?, 0xc001a71188?, 0xc01610cbc0?})
    /go/pkg/mod/github.com/sirupsen/logrus@v1.9.0/entry.go:349 +0x49
github.com/sirupsen/logrus.(*Logger).Logf(0xc000062080, 0x4, {0x1facf6b, 0x3d}, {0xc001a71108, 0x2, 0x2})
    /go/pkg/mod/github.com/sirupsen/logrus@v1.9.0/logger.go:154 +0x85
github.com/sirupsen/logrus.(*Logger).Infof(...)
    /go/pkg/mod/github.com/sirupsen/logrus@v1.9.0/logger.go:168
github.com/sirupsen/logrus.Infof(...)
    /go/pkg/mod/github.com/sirupsen/logrus@v1.9.0/exported.go:199
github.com/projectcalico/calico/kube-controllers/pkg/controllers/pod.(*podController).syncToCalico(0xc00060e840, {0xc00ec454d0, 0x2f})
    /go/src/github.com/projectcalico/calico/kube-controllers/pkg/controllers/pod/pod_controller.go:324 +0x90f
github.com/projectcalico/calico/kube-controllers/pkg/controllers/pod.(*podController).processNextItem(0xc00060e840)
    /go/src/github.com/projectcalico/calico/kube-controllers/pkg/controllers/pod/pod_controller.go:275 +0x70
github.com/projectcalico/calico/kube-controllers/pkg/controllers/pod.(*podController).runWorker(0x4?)
    /go/src/github.com/projectcalico/calico/kube-controllers/pkg/controllers/pod/pod_controller.go:260 +0x25
created by github.com/projectcalico/calico/kube-controllers/pkg/controllers/pod.(*podController).Run
    /go/src/github.com/projectcalico/calico/kube-controllers/pkg/controllers/pod/pod_controller.go:251 +0x3c6

Your Environment

testwill commented 11 months ago

root@xxxx# kubectl describe pod calico-kube-controllers-949d58b75-rdlfd -n kube-system Name: calico-kube-controllers-949d58b75-rdlfd Namespace: kube-system Priority: 2000000000 Priority Class Name: system-cluster-critical Service Account: calico-kube-controllers Node: k8smaster/192.168.1.51 Start Time: Fri, 24 Nov 2023 06:38:24 +0000 Labels: k8s-app=calico-kube-controllers pod-template-hash=949d58b75 Annotations: cni.projectcalico.org/containerID: 3182ba0ab737bf526fe3d5e7c39ca1e303c8c1f77f9168c69835f94f988e2fc1 cni.projectcalico.org/podIP: 10.244.0.131/32 cni.projectcalico.org/podIPs: 10.244.0.131/32 Status: Running IP: 10.244.0.131 IPs: IP: 10.244.0.131 Controlled By: ReplicaSet/calico-kube-controllers-949d58b75 Containers: calico-kube-controllers: Container ID: containerd://e05e640572ecf43e7d016bda9b85d48baeaac2760f8ae9cffc4146e3d664d349 Image: docker.io/calico/kube-controllers:v3.26.1 Image ID: docker.io/calico/kube-controllers@sha256:01ce29ea8f2b34b6cef904f526baed98db4c0581102f194e36f2cd97943f77aa Port: Host Port: State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: StartError Message: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "/usr/bin/kube-controllers": stat /usr/bin/kube-controllers: no such file or directory: unknown Exit Code: 128 Started: Thu, 01 Jan 1970 00:00:00 +0000 Finished: Fri, 24 Nov 2023 06:40:24 +0000 Ready: False Restart Count: 4 Liveness: exec [/usr/bin/check-status -l] delay=10s timeout=10s period=10s #success=1 #failure=6 Readiness: exec [/usr/bin/check-status -r] delay=0s timeout=1s period=10s #success=1 #failure=3 Environment: ENABLED_CONTROLLERS: node DATASTORE_TYPE: kubernetes Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hbpzs (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: kube-api-access-hbpzs: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: BestEffort Node-Selectors: kubernetes.io/os=linux Tolerations: CriticalAddonsOnly op=Exists node-role.kubernetes.io/control-plane:NoSchedule node-role.kubernetes.io/master:NoSchedule node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message


Warning FailedScheduling 2m34s default-scheduler 0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.. Normal Scheduled 2m26s default-scheduler Successfully assigned kube-system/calico-kube-controllers-949d58b75-rdlfd to k8smaster Warning FailedCreatePodSandBox 2m26s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "e32625c1313017295f8dcdf68705d5b689e6ba1928f2bddf983f7c6c2a844de7": plugin type="calico" failed (add): error getting ClusterInformation: resource does not exist: ClusterInformation(default) with error: clusterinformations.crd.projectcalico.org "default" not found Normal Pulled 78s (x4 over 2m12s) kubelet Container image "docker.io/calico/kube-controllers:v3.26.1" already present on machine Normal Created 78s (x4 over 2m11s) kubelet Created container calico-kube-controllers Warning Failed 78s (x4 over 2m11s) kubelet Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "/usr/bin/kube-controllers": stat /usr/bin/kube-controllers: no such file or directory: unknown Warning BackOff 65s (x12 over 2m10s) kubelet Back-off restarting failed container calico-kube-controllers in pod calico-kube-controllers-949d58b75-rdlfd_kube-system(70215104-bb33-4113-8a3b-40ab1841d0d3)

testwill commented 11 months ago

calico-kube-controllers CrashLoopBackOff

caseydavenport commented 8 months ago

Is anyone still seeing this panic?