projectcalico / calico

Cloud native networking and network security
https://docs.tigera.io/calico/latest/about/
Apache License 2.0
6.04k stars 1.35k forks source link

Calico-node Frequently enters the Completed state and restarts #9524

Open sunminming opened 1 day ago

sunminming commented 1 day ago
root@worker-01:/home/sunminming# kubectl -n kube-system describe pod calico-node-cbd5s
Name:                 calico-node-cbd5s
Namespace:            kube-system
Priority:             2000001000
Priority Class Name:  system-node-critical
Service Account:      calico-node
Node:                 k8s-10-10-40-33/10.10.40.33
Start Time:           Sun, 24 Nov 2024 15:42:12 +0800
Labels:               controller-revision-hash=56f9dcc8f
                      k8s-app=calico-node
                      pod-template-generation=2
Annotations:          <none>
Status:               Running
IP:                   10.10.40.33
IPs:
  IP:           10.10.40.33
Controlled By:  DaemonSet/calico-node
Init Containers:
  install-cni:
    Container ID:  containerd://3ae90d7f54c60c7e647000b001264bc6886fbe27d6fb3085d0e3bbf98574a767
    Image:         easzlab.io.local:5000/calico/cni:v3.26.4
    Image ID:      sha256:17d35f5bad38f1d00ee41111d6655540797ec5011740a733b706b4717d300ede
    Port:          <none>
    Host Port:     <none>
    Command:
      /opt/cni/bin/install
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sun, 24 Nov 2024 16:06:49 +0800
      Finished:     Sun, 24 Nov 2024 16:06:50 +0800
    Ready:          True
    Restart Count:  7
    Environment Variables from:
      kubernetes-services-endpoint  ConfigMap  Optional: true
    Environment:
      CNI_CONF_NAME:       10-calico.conflist
      CNI_NETWORK_CONFIG:  <set to the key 'cni_network_config' of config map 'calico-config'>  Optional: false
      ETCD_ENDPOINTS:      <set to the key 'etcd_endpoints' of config map 'calico-config'>      Optional: false
      CNI_MTU:             <set to the key 'veth_mtu' of config map 'calico-config'>            Optional: false
      SLEEP:               false
    Mounts:
      /calico-secrets from etcd-certs (rw)
      /host/etc/cni/net.d from cni-net-dir (rw)
      /host/opt/cni/bin from cni-bin-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zrfvj (ro)
 mount-bpffs:
    Container ID:  containerd://b322c5f181a565125ce28171c209dbfa9cc11f14b7b4f66853f5d0cc95f01f64
    Image:         easzlab.io.local:5000/calico/node:v3.26.4
    Image ID:      sha256:ded66453eb630bd4d4efddee2ccf290cbca4c67bca07c2d53c35c35dd0251136
    Port:          <none>
    Host Port:     <none>
    Command:
      calico-node
      -init
      -best-effort
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sun, 24 Nov 2024 16:06:51 +0800
      Finished:     Sun, 24 Nov 2024 16:06:51 +0800
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /nodeproc from nodeproc (ro)
      /sys/fs from sys-fs (rw)
      /var/run/calico from var-run-calico (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zrfvj (ro)
Containers:
  calico-node:
    Container ID:   containerd://bac41554456221baba6f963d8c84fccb7bf9cf5fc38d744263761e5f84858d98
    Image:          easzlab.io.local:5000/calico/node:v3.26.4
    Image ID:       sha256:ded66453eb630bd4d4efddee2ccf290cbca4c67bca07c2d53c35c35dd0251136
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sun, 24 Nov 2024 16:03:51 +0800
      Finished:     Sun, 24 Nov 2024 16:06:48 +0800
    Ready:          False
    Restart Count:  6
    Limits:
      cpu:     500m
      memory:  1Gi
    Requests:
      cpu:      500m
      memory:   1Gi
    Liveness:   exec [/bin/calico-node -felix-live -bird-live] delay=10s timeout=10s period=10s #success=1 #failure=6
    Readiness:  exec [/bin/calico-node -felix-ready -bird-ready] delay=0s timeout=10s period=10s #success=1 #failure=3
    Environment Variables from:
      kubernetes-services-endpoint  ConfigMap  Optional: true
    Environment:
      ETCD_ENDPOINTS:                     <set to the key 'etcd_endpoints' of config map 'calico-config'>  Optional: false
      ETCD_CA_CERT_FILE:                  <set to the key 'etcd_ca' of config map 'calico-config'>         Optional: false
      ETCD_KEY_FILE:                      <set to the key 'etcd_key' of config map 'calico-config'>        Optional: false
      ETCD_CERT_FILE:                     <set to the key 'etcd_cert' of config map 'calico-config'>       Optional: false
      CALICO_K8S_NODE_REF:                 (v1:spec.nodeName)
      CALICO_NETWORKING_BACKEND:          <set to the key 'calico_backend' of config map 'calico-config'>  Optional: false
      CLUSTER_TYPE:                       k8s,bgp
      IP:                                 autodetect
      IP_AUTODETECTION_METHOD:            can-reach=10.10.40.34
      CALICO_IPV4POOL_IPIP:               Always
      FELIX_IPINIPMTU:                    <set to the key 'veth_mtu' of config map 'calico-config'>  Optional: false
      FELIX_VXLANMTU:                     <set to the key 'veth_mtu' of config map 'calico-config'>  Optional: false
      FELIX_WIREGUARDMTU:                 <set to the key 'veth_mtu' of config map 'calico-config'>  Optional: false
      CALICO_IPV4POOL_CIDR:               172.20.0.0/16
      CALICO_DISABLE_FILE_LOGGING:        true
      FELIX_DEFAULTENDPOINTTOHOSTACTION:  ACCEPT
      FELIX_IPV6SUPPORT:                  false
      FELIX_HEALTHENABLED:                true
      FELIX_KUBENODEPORTRANGES:           30000:32767
      FELIX_PROMETHEUSMETRICSENABLED:     false
    Mounts:
      /calico-secrets from etcd-certs (rw)
      /host/etc/cni/net.d from cni-net-dir (rw)
      /lib/modules from lib-modules (ro)
      /run/xtables.lock from xtables-lock (rw)
      /sys/fs/bpf from bpffs (rw)
      /var/lib/calico from var-lib-calico (rw)
      /var/log/calico/cni from cni-log-dir (ro)
      /var/run/calico from var-run-calico (rw)
      /var/run/nodeagent from policysync (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zrfvj (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True
  Initialized                 True
  Ready                       False
  ContainersReady             False
  PodScheduled                True
Volumes:
  lib-modules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:
  var-run-calico:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/calico
    HostPathType:
  var-lib-calico:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/calico
    HostPathType:
  xtables-lock:
    Type:          HostPath (bare host directory volume)
    Path:          /run/xtables.lock
    HostPathType:  FileOrCreate
  sys-fs:
    Type:          HostPath (bare host directory volume)
    Path:          /sys/fs/
    HostPathType:  DirectoryOrCreate
  bpffs:
    Type:          HostPath (bare host directory volume)
    Path:          /sys/fs/bpf
    HostPathType:  Directory
  nodeproc:
    Type:          HostPath (bare host directory volume)
    Path:          /proc
    HostPathType:
  cni-bin-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /opt/cni/bin
    HostPathType:
  cni-net-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/cni/net.d
    HostPathType:
  cni-log-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/log/calico/cni
    HostPathType:
  etcd-certs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  calico-etcd-secrets
    Optional:    false
  policysync:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/nodeagent
    HostPathType:  DirectoryOrCreate
  kube-api-access-zrfvj:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 :NoSchedule op=Exists
                             :NoExecute op=Exists
                             CriticalAddonsOnly op=Exists
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason     Age   From               Message
  ----     ------     ----  ----               -------
  Normal   Scheduled  24m   default-scheduler  Successfully assigned kube-system/calico-node-cbd5s to k8s-10-10-40-33
  Warning  Unhealthy  24m   kubelet            Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
W1124 07:42:16.221537      48 feature_gate.go:241] Setting GA feature gate ServiceInternalTrafficPolicy=true. It will be removed in a future release.
  Warning  Unhealthy  24m  kubelet  Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
W1124 07:42:18.006339     124 feature_gate.go:241] Setting GA feature gate ServiceInternalTrafficPolicy=true. It will be removed in a future release.
  Warning  Unhealthy  24m  kubelet  Readiness probe failed: 2024-11-24 07:42:22.723 [INFO][253] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.10.40.34
W1124 07:42:22.715553     253 feature_gate.go:241] Setting GA feature gate ServiceInternalTrafficPolicy=true. It will be removed in a future release.
  Normal   Killing         23m                kubelet  Stopping container calico-node
  Normal   Pulled          23m (x2 over 24m)  kubelet  Container image "easzlab.io.local:5000/calico/cni:v3.26.4" already present on machine
  Normal   Created         23m (x2 over 24m)  kubelet  Created container install-cni
  Normal   Started         23m (x2 over 24m)  kubelet  Started container install-cni
  Normal   SandboxChanged  23m                kubelet  Pod sandbox changed, it will be killed and re-created.
  Normal   Pulled          23m (x2 over 24m)  kubelet  Container image "easzlab.io.local:5000/calico/node:v3.26.4" already present on machine
  Normal   Started         23m (x2 over 24m)  kubelet  Started container mount-bpffs
  Normal   Created         23m (x2 over 24m)  kubelet  Created container mount-bpffs
  Normal   Started         23m (x2 over 24m)  kubelet  Started container calico-node
  Normal   Created         23m (x2 over 24m)  kubelet  Created container calico-node
  Warning  Unhealthy       23m                kubelet  Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
W1124 07:43:43.579392      99 feature_gate.go:241] Setting GA feature gate ServiceInternalTrafficPolicy=true. It will be removed in a future release.
  Warning  Unhealthy  23m  kubelet  Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
W1124 07:43:44.185882     142 feature_gate.go:241] Setting GA feature gate ServiceInternalTrafficPolicy=true. It will be removed in a future release.
  Normal   Pulled     19m (x4 over 24m)     kubelet  Container image "easzlab.io.local:5000/calico/node:v3.26.4" already present on machine
  Warning  Unhealthy  9m57s (x37 over 19m)  kubelet  (combined from similar events): Readiness probe failed: 2024-11-24 07:57:12.867 [INFO][717] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.10.40.34
W1124 07:57:12.779037     717 feature_gate.go:241] Setting GA feature gate ServiceInternalTrafficPolicy=true. It will be removed in a future release.
  Warning  BackOff  4m48s (x24 over 23m)  kubelet  Back-off restarting failed container calico-node in pod calico-node-cbd5s_kube-system(588856b9-3c97-4630-bd0b-39e66b0c24e8)
 2024-11-24 08:04:54.984 [INFO][100] monitor-addresses/reachaddr.go 47: Auto-detected address by connecting to remote Destination="10.10.40.34" IP=10.10.40.33
2024-11-24 08:04:54.985 [INFO][100] monitor-addresses/autodetection_methods.go 143: Using autodetected IPv4 address 10.10.40.33/22, detected by connecting to 10.10.40.34
2024-11-24 08:04:59.161 [INFO][104] felix/summary.go 100: Summarising 20 dataplane reconciliation loops over 1m4s: avg=50ms longest=516ms (resync-filter-v4,resync-ipsets-v4,resync-mangle-v4,resync-nat-v4,resync-raw-v4,resync-routes-v4,resync-routes-v4,resync-rules-v4,update-filter-v4,update-ipsets-4,update-mangle-v4,update-nat-v4,update-raw-v4)
2024-11-24 08:05:16.243 [INFO][104] felix/int_dataplane.go 1289: Linux interface state changed. ifIndex=74 ifaceName="nodelocaldns" state=""
2024-11-24 08:05:16.243 [INFO][104] felix/int_dataplane.go 1325: Linux interface addrs changed. addrs=<nil> ifaceName="nodelocaldns"
2024-11-24 08:05:16.243 [INFO][104] felix/iface_monitor.go 235: Netlink address update but interface isn't yet known.  Will handle when interface is signalled. addr="169.254.20.10" exists=false ifIndex=74
2024-11-24 08:05:16.243 [INFO][104] felix/int_dataplane.go 1893: Received interface update msg=&intdataplane.ifaceStateUpdate{Name:"nodelocaldns", State:"", Index:74}
2024-11-24 08:05:16.243 [INFO][104] felix/int_dataplane.go 1913: Received interface addresses update msg=&intdataplane.ifaceAddrsUpdate{Name:"nodelocaldns", Addrs:set.Set[string](nil)}
2024-11-24 08:05:16.243 [INFO][104] felix/hostip_mgr.go 84: Interface addrs changed. update=&intdataplane.ifaceAddrsUpdate{Name:"nodelocaldns", Addrs:set.Set[string](nil)}
2024-11-24 08:05:16.243 [INFO][104] felix/ipsets.go 130: Queueing IP set for creation family="inet" setID="this-host" setType="hash:ip"
2024-11-24 08:05:16.245 [INFO][104] felix/ipsets.go 778: Doing full IP set rewrite family="inet" numMembersInPendingReplace=5 setID="this-host"
2024-11-24 08:05:36.394 [INFO][104] felix/int_dataplane.go 1289: Linux interface state changed. ifIndex=77 ifaceName="nodelocaldns" state="down"
2024-11-24 08:05:36.394 [INFO][104] felix/int_dataplane.go 1325: Linux interface addrs changed. addrs=set.Set{169.254.20.10} ifaceName="nodelocaldns"
2024-11-24 08:05:36.394 [INFO][104] felix/iface_monitor.go 238: Netlink address update for known interface.  addr="169.254.20.10" exists=true ifIndex=77
2024-11-24 08:05:36.394 [INFO][104] felix/int_dataplane.go 1893: Received interface update msg=&intdataplane.ifaceStateUpdate{Name:"nodelocaldns", State:"down", Index:77}
2024-11-24 08:05:36.394 [INFO][104] felix/int_dataplane.go 1913: Received interface addresses update msg=&intdataplane.ifaceAddrsUpdate{Name:"nodelocaldns", Addrs:set.Typed[string]{"169.254.20.10":set.v{}}}
2024-11-24 08:05:36.394 [INFO][104] felix/hostip_mgr.go 84: Interface addrs changed. update=&intdataplane.ifaceAddrsUpdate{Name:"nodelocaldns", Addrs:set.Typed[string]{"169.254.20.10":set.v{}}}
2024-11-24 08:05:36.395 [INFO][104] felix/ipsets.go 130: Queueing IP set for creation family="inet" setID="this-host" setType="hash:ip"
2024-11-24 08:05:36.396 [INFO][104] felix/ipsets.go 778: Doing full IP set rewrite family="inet" numMembersInPendingReplace=6 setID="this-host"
2024-11-24 08:05:54.988 [INFO][100] monitor-addresses/reachaddr.go 47: Auto-detected address by connecting to remote Destination="10.10.40.34" IP=10.10.40.33
2024-11-24 08:05:54.990 [INFO][100] monitor-addresses/autodetection_methods.go 143: Using autodetected IPv4 address 10.10.40.33/22, detected by connecting to 10.10.40.34
bird: Mesh_10_10_40_34: State changed to stop
bird: Mesh_10_10_40_34: State changed to down
bird: Mesh_10_10_40_34: Starting
bird: Mesh_10_10_40_34: State changed to start
2024-11-24 08:06:03.090 [INFO][104] felix/summary.go 100: Summarising 13 dataplane reconciliation loops over 1m3.9s: avg=20ms longest=70ms (resync-filter-v4)

Expected Behavior

keep Running

Current Behavior

completed and restart

Possible Solution

Steps to Reproduce (for bugs)

1. 2. 3. 4.

Context

Your Environment