OCI cloud OKE cluster along with Flannel CNI, after installing calico after few hours pods are getting restart

vfnp commented 2 months ago

Using OCI cloud OKE cluster along with Flannel CNI, Installed Calico version 3.27.2 by referring oracle official document https://docs.oracle.com/en-us/iaas/Content/ContEng/Tasks/contengsettingupcalico.htm#manualcalicoinstall

Expected Behavior

All pods should be running and no issues should be highlighted.

Current Behavior

After sometime some pods are getting restart. observing the containers are restarted by multiple times like around 8-10 times and pod will be running state, not sure why pods are getting restarted. Please suggest for solution.

Your Environment

Calico version : 3.27.2
Orchestrator version (e.g. kubernetes, mesos, rkt): OKE v1.29.1
Operating System and version: Linux 7.9
Link to your project (optional):

caseydavenport commented 2 months ago

@vfnp this issue is lacking a lot of information that makes it hard to guess what might be going wrong. For starters:

What pods are being restarted?
Can you provide logs from the restarting pods?

vfnp commented 1 month ago

Hi @caseydavenport : Below is logs of nginx ingress controller pods: logs naveens_pa@cloudshell:~ (eu-dcc-milan-1)$ kubectl logs nginx-ingress-nginx-controller-xx5mf -n ingress-nginx

NGINX Ingress controller Release: v1.10.1 Build: 4fb5aac1dd3669daa3a14d9de3e3cdb371b4c518 Repository: https://github.com/kubernetes/ingress-nginx nginx version: nginx/1.25.3

W0710 08:28:10.783244 7 client_config.go:618] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. I0710 08:28:10.783344 7 main.go:205] "Creating API client" host="https://11.97.0.1:443" naveens_pa@cloudshell:~ (eu-dcc-milan-1)$

Pods: ingress-nginx nginx-ingress-nginx-controller-wjjdr 1/1 Running 0 41h ingress-nginx nginx-ingress-nginx-controller-xx5mf 0/1 CrashLoopBackOff 737 (7s ago) 41h ingress-nginx nginx-ingress-nginx-defaultbackend-d6b7c55d9-lp82l 1/1 Running 0 41h

Describe pods

Volumes: kube-api-access-ks86n: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: Burstable Node-Selectors: kubernetes.io/os=linux Tolerations: node.kubernetes.io/disk-pressure:NoSchedule op=Exists node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists node.kubernetes.io/pid-pressure:NoSchedule op=Exists node.kubernetes.io/unreachable:NoExecute op=Exists node.kubernetes.io/unschedulable:NoSchedule op=Exists Events: Type Reason Age From Message

Warning Unhealthy 19m (x3976 over 41h) kubelet Readiness probe failed: Get "http://11.96.0.131:10254/healthz": dial tcp 11.96.0.131:10254: connect: connection refused Warning BackOff 4m3s (x8965 over 41h) kubelet Back-off restarting failed container controller in pod nginx-ingress-nginx-controller-xx5mf_ingress-nginx(e943b84c-c2cb-47e7-b9df-72acc128209e)

caseydavenport commented 1 month ago

kubelet Readiness probe failed: Get "http://11.96.0.131:10254/healthz": dial tcp 11.96.0.131:10254: connect: connection refused

This seems to suggest that nginx is refusing the request.

This thread seems similar: https://github.com/kubernetes/ingress-nginx/issues/5058

Potentially an issue with nginx being unable to access the k8s API server.

The Calico team doesn't maintain the integration with OKE, so might be worth bringing this to the attention of the OKE team. Sounds potentially like it could be an issue with node-to-node routing, or Service proxying.

sridhartigera commented 3 weeks ago

@vfnp Any update on this?

projectcalico / calico