Open Galphaa opened 2 days ago
As discussed in slack, this went away when selinux was disabled.
Here's the selinux RPM instructions from the calico enterprise docs: https://docs.tigera.io/calico-enterprise/latest/getting-started/install-on-clusters/requirements
I was switching from Amazon Linux 2 EKS to Amazon Linux 2023 and after migrating AMI I got all my pods crashing kubernetes version 1.28 EKS AL2023 and Calico Manifest instalation 3.26.4
Steps to Reproduce (for bugs)
Calico pod logs
Controller describe `Containers: calico-kube-controllers: Port:
Host Port:
State: Running
Started: Wed, 09 Oct 2024 18:09:13 +0400
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Wed, 09 Oct 2024 18:02:34 +0400
Finished: Wed, 09 Oct 2024 18:04:04 +0400
Ready: True
Restart Count: 38
Liveness: exec [/usr/bin/check-status -l] delay=10s timeout=10s period=10s #success=1 #failure=6
Readiness: exec [/usr/bin/check-status -r] delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
ENABLED_CONTROLLERS: node
DATASTORE_TYPE: kubernetes
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-wh7sc (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
kube-api-access-wh7sc:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly op=Exists
node-role.kubernetes.io/control-plane:NoSchedule
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
Normal Killing 39m (x32 over 3h14m) kubelet Stopping container calico-kube-controllers Normal Pulled 34m (x32 over 3h14m) kubelet Container image "ge.ecr.ge-west-1.amazonaws.com/calico-kube-controllers:v3.26.4" already present on machine Warning BackOff 4m47s (x851 over 3h14m) kubelet Back-off restarting failed container calico-kube-controllers in pod calico-kube-controllers-8685c56787-4nfrm_kube-system(5260d52e-a763-4d3c-bb40-1a191b0e24d3)`
Calico pod describe
` Host Port:
Command:
/opt/cni/bin/calico-ipam
-upgrade
State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 09 Oct 2024 18:04:07 +0400
Finished: Wed, 09 Oct 2024 18:04:07 +0400
Ready: True
Restart Count: 7
Environment Variables from:
kubernetes-services-endpoint ConfigMap Optional: true
Environment:
KUBERNETES_NODE_NAME: (v1:spec.nodeName)
CALICO_NETWORKING_BACKEND: <set to the key 'calico_backend' of config map 'calico-config'> Optional: false
FELIX_AWSSRCDSTCHECK: Disable
Mounts:
/host/opt/cni/bin from cni-bin-dir (rw)
/var/lib/cni/networks from host-local-net-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zktff (ro)
install-cni:
Container ID: containerd://01054a69f401c0fcc052477d2ac09b6b22c7c6ec2d9d9d246a6916c82dc6b453
Port:
Host Port:
Command:
/opt/cni/bin/install
State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 09 Oct 2024 18:04:08 +0400
Finished: Wed, 09 Oct 2024 18:04:19 +0400
Ready: True
Restart Count: 0
Environment Variables from:
kubernetes-services-endpoint ConfigMap Optional: true
Environment:
CNI_CONF_NAME: 10-calico.conflist
CNI_NETWORK_CONFIG: <set to the key 'cni_network_config' of config map 'calico-config'> Optional: false
KUBERNETES_NODE_NAME: (v1:spec.nodeName)
CNI_MTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
SLEEP: false
FELIX_AWSSRCDSTCHECK: Disable
Mounts:
/host/etc/cni/net.d from cni-net-dir (rw)
/host/opt/cni/bin from cni-bin-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zktff (ro)
mount-bpffs:
Container ID: containerd://0b459d688ee89be30210e9cfa3f1d5efafc8cb7d8c0fa02c99cdb4073f9a1f55
Host Port:
Command:
calico-node
-init
-best-effort
State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 09 Oct 2024 18:04:19 +0400
Finished: Wed, 09 Oct 2024 18:04:19 +0400
Ready: True
Restart Count: 0
Environment:
FELIX_AWSSRCDSTCHECK: Disable
Mounts:
/nodeproc from nodeproc (ro)
/sys/fs from sys-fs (rw)
/var/run/calico from var-run-calico (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zktff (ro)
Containers:
calico-node:
Host Port:
State: Running
Started: Wed, 09 Oct 2024 18:09:10 +0400
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 09 Oct 2024 17:59:10 +0400
Finished: Wed, 09 Oct 2024 18:04:06 +0400
Ready: True
Restart Count: 39
Requests:
cpu: 250m
Liveness: exec [/bin/calico-node -felix-live] delay=10s timeout=10s period=10s #success=1 #failure=6
Readiness: exec [/bin/calico-node -felix-ready] delay=0s timeout=10s period=10s #success=1 #failure=3
Environment Variables from:
kubernetes-services-endpoint ConfigMap Optional: true
Environment:
DATASTORE_TYPE: kubernetes
WAIT_FOR_DATASTORE: true
NODENAME: (v1:spec.nodeName)
CALICO_NETWORKING_BACKEND: <set to the key 'calico_backend' of config map 'calico-config'> Optional: false
CLUSTER_TYPE: k8s,bgp
IP: autodetect
CALICO_IPV4POOL_IPIP: Never
CALICO_IPV4POOL_VXLAN: CrossSubnet
CALICO_IPV6POOL_VXLAN: CrossSubnet
FELIX_IPINIPMTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
FELIX_VXLANMTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
FELIX_WIREGUARDMTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
CALICO_DISABLE_FILE_LOGGING: true
FELIX_DEFAULTENDPOINTTOHOSTACTION: ACCEPT
FELIX_IPV6SUPPORT: false
FELIX_HEALTHENABLED: true
FELIX_AWSSRCDSTCHECK: Disable
Mounts:
/host/etc/cni/net.d from cni-net-dir (rw)
/lib/modules from lib-modules (ro)
/run/xtables.lock from xtables-lock (rw)
/sys/fs/bpf from bpffs (rw)
/var/lib/calico from var-lib-calico (rw)
/var/log/calico/cni from cni-log-dir (ro)
/var/run/calico from var-run-calico (rw)
/var/run/nodeagent from policysync (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zktff (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
lib-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType:
var-run-calico:
Type: HostPath (bare host directory volume)
Path: /var/run/calico
HostPathType:
var-lib-calico:
Type: HostPath (bare host directory volume)
Path: /var/lib/calico
HostPathType:
xtables-lock:
Type: HostPath (bare host directory volume)
Path: /run/xtables.lock
HostPathType: FileOrCreate
sys-fs:
Type: HostPath (bare host directory volume)
Path: /sys/fs/
HostPathType: DirectoryOrCreate
bpffs:
Type: HostPath (bare host directory volume)
Path: /sys/fs/bpf
HostPathType: Directory
nodeproc:
Type: HostPath (bare host directory volume)
Path: /proc
HostPathType:
cni-bin-dir:
Type: HostPath (bare host directory volume)
Path: /opt/cni/bin
HostPathType:
cni-net-dir:
Type: HostPath (bare host directory volume)
Path: /etc/cni/net.d
HostPathType:
cni-log-dir:
Type: HostPath (bare host directory volume)
Path: /var/log/calico/cni
HostPathType:
host-local-net-dir:
Type: HostPath (bare host directory volume)
Path: /var/lib/cni/networks
HostPathType:
policysync:
Type: HostPath (bare host directory volume)
Path: /var/run/nodeagent
HostPathType: DirectoryOrCreate
kube-api-access-zktff:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: :NoSchedule op=Exists
:NoExecute op=Exists
CriticalAddonsOnly op=Exists
node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/network-unavailable:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
Type Reason Age From Message
Normal SandboxChanged 49m (x29 over 3h15m) kubelet Pod sandbox changed, it will be killed and re-created. Normal Killing 6m50s (x39 over 3h15m) kubelet Stopping container calico-node Warning BackOff 5m14s (x644 over 3h14m) kubelet Back-off restarting failed container calico-node in pod calico-node-4js87_kube-system(2c7fcb86-8d13-48ab-9f65-f78435955023)`
pod full log
2024-10-09 10:58:37.181 [INFO][77] felix/int_dataplane.go 1836: Received *proto.WorkloadEndpointUpdate update from calculation graph msg=id:<orchestrator_id:"k8s" workload_id:"kube-system/calico-kube-controllers-8685c56787-4nfrm" endpoint_id:"eth0" > endpoint:<state:"active" name:"calia607cfeb82d" profile_ids:"kns.kube-system" profile_ids:"ksa.kube-system.calico-kube-controllers" ipv4_nets:"192.168.141.160/32" > 2024-10-09 10:58:37.181 [INFO][77] felix/endpoint_mgr.go 602: Updating per-endpoint chains. id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"kube-system/calico-kube-controllers-8685c56787-4nfrm", EndpointId:"eth0"} 2024-10-09 10:58:37.181 [INFO][77] felix/table.go 508: Queueing update of chain. chainName="cali-tw-calia607cfeb82d" ipVersion=0x4 table="filter" 2024-10-09 10:58:37.181 [INFO][77] felix/table.go 508: Queueing update of chain. chainName="cali-fw-calia607cfeb82d" ipVersion=0x4 table="filter" 2024-10-09 10:58:37.181 [INFO][77] felix/endpoint_mgr.go 648: Updating endpoint routes. id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"kube-system/calico-kube-controllers-8685c56787-4nfrm", EndpointId:"eth0"} 2024-10-09 10:58:37.182 [INFO][77] felix/endpoint_mgr.go 1215: Applying /proc/sys configuration to interface. ifaceName="calia607cfeb82d" 2024-10-09 10:58:37.182 [INFO][77] felix/endpoint_mgr.go 490: Re-evaluated workload endpoint status adminUp=true failed=false known=true operUp=true status="up" workloadEndpointID=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"kube-system/calico-kube-controllers-8685c56787-4nfrm", EndpointId:"eth0"} 2024-10-09 10:58:37.182 [INFO][77] felix/status_combiner.go 58: Storing endpoint status update ipVersion=0x4 status="up" workload=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"kube-system/calico-kube-controllers-8685c56787-4nfrm", EndpointId:"eth0"} 2024-10-09 10:58:37.193 [INFO][77] felix/status_combiner.go 81: Endpoint up for at least one IP version id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"kube-system/calico-kube-controllers-8685c56787-4nfrm", EndpointId:"eth0"} ipVersion=0x4 status="up" 2024-10-09 10:58:37.193 [INFO][77] felix/status_combiner.go 98: Reporting combined status. id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"kube-system/calico-kube-controllers-8685c56787-4nfrm", EndpointId:"eth0"} status="up" 2024-10-09 10:58:37.193 [INFO][77] felix/summary.go 100: Summarising 26 dataplane reconciliation loops over 1m3s: avg=15ms longest=259ms (resync-filter-v4,resync-ipsets-v4,resync-mangle-v4,resync-nat-v4,resync-raw-v4,resync-routes-v4,resync-routes-v4,resync-routes-v4,resync-routes-v4,resync-rules-v4,update-filter-v4,update-ipsets-4,update-mangle-v4,update-nat-v4,update-raw-v4) 2024-10-09 10:58:37.230 [INFO][77] felix/calc_graph.go 467: Local endpoint updated id=WorkloadEndpoint(node=ip-10-161-0-237.eu-west-1.compute.internal, orchestrator=k8s, workload=kube-system/calico-kube-controllers-8685c56787-4nfrm, name=eth0) 2024-10-09 10:58:37.230 [INFO][77] felix/int_dataplane.go 1836: Received *proto.WorkloadEndpointUpdate update from calculation graph msg=id:<orchestrator_id:"k8s" workload_id:"kube-system/calico-kube-controllers-8685c56787-4nfrm" endpoint_id:"eth0" > endpoint:<state:"active" name:"calia607cfeb82d" profile_ids:"kns.kube-system" profile_ids:"ksa.kube-system.calico-kube-controllers" ipv4_nets:"192.168.141.160/32" > 2024-10-09 10:58:37.230 [INFO][77] felix/endpoint_mgr.go 602: Updating per-endpoint chains. id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"kube-system/calico-kube-controllers-8685c56787-4nfrm", EndpointId:"eth0"} 2024-10-09 10:58:37.230 [INFO][77] felix/table.go 508: Queueing update of chain. chainName="cali-tw-calia607cfeb82d" ipVersion=0x4 table="filter" 2024-10-09 10:58:37.230 [INFO][77] felix/table.go 508: Queueing update of chain. chainName="cali-fw-calia607cfeb82d" ipVersion=0x4 table="filter" 2024-10-09 10:58:37.230 [INFO][77] felix/endpoint_mgr.go 648: Updating endpoint routes. id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"kube-system/calico-kube-controllers-8685c56787-4nfrm", EndpointId:"eth0"} 2024-10-09 10:58:37.230 [INFO][77] felix/endpoint_mgr.go 1215: Applying /proc/sys configuration to interface. ifaceName="calia607cfeb82d" 2024-10-09 10:58:37.230 [INFO][77] felix/endpoint_mgr.go 490: Re-evaluated workload endpoint status adminUp=true failed=false known=true operUp=true status="up" workloadEndpointID=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"kube-system/calico-kube-controllers-8685c56787-4nfrm", EndpointId:"eth0"} 2024-10-09 10:58:37.230 [INFO][77] felix/status_combiner.go 58: Storing endpoint status update ipVersion=0x4 status="up" workload=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"kube-system/calico-kube-controllers-8685c56787-4nfrm", EndpointId:"eth0"} 2024-10-09 10:58:37.240 [INFO][77] felix/status_combiner.go 81: Endpoint up for at least one IP version id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"kube-system/calico-kube-controllers-8685c56787-4nfrm", EndpointId:"eth0"} ipVersion=0x4 status="up" 2024-10-09 10:58:37.240 [INFO][77] felix/status_combiner.go 98: Reporting combined status. id=proto.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"kube-system/calico-kube-controllers-8685c56787-4nfrm", EndpointId:"eth0"} status="up" 2024-10-09 10:59:34.180 [INFO][84] monitor-addresses/autodetection_methods.go 103: Using autodetected IPv4 address on interface ens5: 10.161.0.237/26 2024-10-09 10:59:40.306 [INFO][77] felix/summary.go 100: Summarising 12 dataplane reconciliation loops over 1m3.1s: avg=5ms longest=19ms ()
Context
we had slack conversation https://calicousers.slack.com/archives/CPEPQE8CS/p1728467954077839
Your Environment