Open telmich opened 2 months ago
For reference the tigera installation:
% kubectl -n tigera get installations.operator.tigera.io -o yaml
apiVersion: v1
items:
- apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
annotations:
meta.helm.sh/release-name: calico
meta.helm.sh/release-namespace: tigera
creationTimestamp: "2023-11-25T17:36:04Z"
finalizers:
- tigera.io/operator-cleanup
- operator.tigera.io/installation-controller
generation: 5
labels:
app.kubernetes.io/managed-by: Helm
name: default
resourceVersion: "365039407"
uid: e131aee1-1715-4658-a1ad-667068394c13
spec:
calicoNetwork:
bgp: Enabled
hostPorts: Enabled
ipPools:
- allowedUses:
- Workload
- Tunnel
blockSize: 122
cidr: 2a0a:e5c0:12:1::/64
disableBGPExport: false
encapsulation: None
name: default-ipv6-ippool
natOutgoing: Disabled
nodeSelector: all()
linuxDataplane: Iptables
linuxPolicySetupTimeoutSeconds: 0
multiInterfaceMode: None
nodeAddressAutodetectionV4:
firstFound: true
nodeAddressAutodetectionV6:
firstFound: true
windowsDataplane: Disabled
cni:
ipam:
type: Calico
type: Calico
controlPlaneReplicas: 2
flexVolumePath: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/
imagePullSecrets: []
kubeletVolumePluginPath: /var/lib/kubelet
kubernetesProvider: ""
logging:
cni:
logFileMaxAgeDays: 30
logFileMaxCount: 10
logFileMaxSize: 100Mi
logSeverity: Info
nodeUpdateStrategy:
rollingUpdate:
maxUnavailable: 1
type: RollingUpdate
nonPrivileged: Disabled
variant: Calico
status:
conditions:
- lastTransitionTime: "2024-06-25T22:20:42Z"
message: 'Pod calico-system/calico-node-9pj2s has crash looping container: calico-node'
observedGeneration: 5
reason: PodFailure
status: "True"
type: Degraded
- lastTransitionTime: "2024-06-25T22:20:42Z"
message: ""
observedGeneration: 5
reason: Unknown
status: "False"
type: Ready
- lastTransitionTime: "2024-06-25T22:20:42Z"
message: ""
observedGeneration: 5
reason: Unknown
status: "False"
type: Progressing
kind: List
metadata:
resourceVersion: ""
Interestingly, same result with 3.27.3:
VERSION=v3.27.3
helm repo add projectcalico https://docs.projectcalico.org/charts
helm repo update
helm upgrade --install --namespace tigera calico projectcalico/tigera-operator --version $VERSION --create-namespace
% kubectl get pods -n calico-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-5d7d79486c-kfnc2 1/1 Running 0 2m1s
calico-node-2bpv2 0/1 CrashLoopBackOff 3 (34s ago) 2m2s
calico-node-65cgv 0/1 CrashLoopBackOff 3 (<invalid> ago) 2m2s
calico-node-gxt2v 1/1 Running 7 213d
calico-node-khpx8 1/1 Running 5 (91d ago) 164d
calico-node-lgjt7 1/1 Running 0 9h
calico-node-ms4wr 1/1 Running 0 9h
calico-node-vhnfg 1/1 Running 6 164d
calico-node-w9rsn 0/1 CrashLoopBackOff 3 (<invalid> ago) 2m2s
calico-typha-5665d494b5-7sgrw 1/1 Running 0 2m3s
calico-typha-5665d494b5-hhfnm 1/1 Running 0 2m3s
calico-typha-5665d494b5-k5fvz 1/1 Running 0 2m3s
csi-node-driver-29f8f 2/2 Running 0 2m2s
csi-node-driver-2br86 2/2 Running 0 75s
csi-node-driver-2q6rd 2/2 Running 0 69s
csi-node-driver-42gmf 2/2 Running 0 83s
csi-node-driver-ftqq2 2/2 Running 0 64s
csi-node-driver-n6k9b 2/2 Running 0 53s
csi-node-driver-rbzqz 2/2 Running 0 59s
csi-node-driver-tkqf7 2/2 Running 0 95s
% kubectl -n calico-system logs calico-node-2bpv2
Defaulted container "calico-node" out of: calico-node, flexvol-driver (init), install-cni (init)
2024-06-26 07:28:35.394 [INFO][4] startup/startup.go 445: Early log level set to info
2024-06-26 07:28:35.394 [INFO][4] startup/utils.go 126: Using NODENAME environment for node name server69
2024-06-26 07:28:35.394 [INFO][4] startup/utils.go 138: Determined node name: server69
2024-06-26 07:28:35.394 [INFO][4] startup/startup.go 95: Starting node server69 with version v3.27.3
2024-06-26 07:28:35.395 [INFO][4] startup/startup.go 450: Checking datastore connection
2024-06-26 07:28:35.410 [INFO][4] startup/startup.go 474: Datastore connection verified
2024-06-26 07:28:35.410 [INFO][4] startup/startup.go 105: Datastore is ready
2024-06-26 07:28:35.418 [WARNING][4] startup/winutils.go 144: Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
2024-06-26 07:28:35.430 [INFO][4] startup/startup.go 503: Initialize BGP data
2024-06-26 07:28:35.430 [WARNING][4] startup/autodetection_methods.go 99: Unable to auto-detect an IPv4 address: no valid IPv4 addresses found on the host interfaces
2024-06-26 07:28:35.430 [WARNING][4] startup/startup.go 525: Couldn't autodetect an IPv4 address. If auto-detecting, choose a different autodetection method. Otherwise provide an explicit address.
2024-06-26 07:28:35.431 [INFO][4] startup/startup.go 409: Clearing out-of-date IPv4 address from this node IP=""
2024-06-26 07:28:35.431 [INFO][4] startup/startup.go 413: Clearing out-of-date IPv6 address from this node IP=""
2024-06-26 07:28:35.446 [WARNING][4] startup/utils.go 48: Terminating
Calico node failed to start
2024-06-25 21:58:37.338 [WARNING][4] startup/autodetection_methods.go 99: Unable to auto-detect an IPv4 address: no valid IPv4 addresses found on the host interfaces
Seems like Calico is configured to detect an IPv4 address on the host, which it is failing to do because this is an IPv6 only cluster.
Might want to adjust your Installation like this:
nodeAddressAutodetectionV4: {}
To disable IPv4 auto detection.
@telmich did you try the above?
Expected Behavior
calico-node runs
Current Behavior
calico-node crashes
Possible Solution
Unclear
Steps to Reproduce (for bugs)
Run
Result:
Context
This is an 1.30.1 k8s cluster that was running 3.26.4 before the upgrade:
The node on which calico-node crashes has the following IP+ routing information:
Your Environment
logs from 3.26.4
for reference, from a running pod on another node, attached as calico-node-lgjt7-log.txt: calico-node-lgjt7-log.txt