Closed vknemanavar closed 3 years ago
Hi @vknemanavar
Thanks for logging this issue. We are investigating this issue now.
Based on the logs you provided and our investigation, the resource limits are the cause of the issue. Looking at the logs the service's QOS class is Burstable
. According to this guide, https://docs.openshift.com/container-platform/3.6/dev_guide/compute_resources.html#quality-of-service-tiers it's Burstable
as the resource limits are not the same as the requests. According to the guide:
... If there is an out of memory event on the node, `Burstable` containers are killed after `BestEffort` containers when attempting to recover memory.
So if any pod or container causes an OOM event, the Burstable
container will be killed.
While we continue to investigate, you can edit the manifest yaml of the operator and remove those limits.
I could do that editing the yaml, but it should have part of the operator why this step need to be done additionally
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been stalled for 7 days with no activity.
Describe the bug Nginx Ingress Operator v0.3.0 pod keep on restarting with OOMKILL
To Reproduce Steps to reproduce the behavior:
Name: nginx-ingress-operator-controller-manager-7c9d8899f8-dkt6b Namespace: openshift-operators Priority: 0 Node: 10.73.184.244/10.73.184.244 Start Time: Fri, 16 Jul 2021 15:16:02 +0530 Labels: control-plane=controller-manager pod-template-hash=7c9d8899f8 Annotations: alm-examples: [ { "apiVersion": "k8s.nginx.org/v1alpha1", "kind": "NginxIngressController", "metadata": { "name": "my-nginx-ingress-controller" }, "spec": { "image": { "pullPolicy": "Always", "repository": "docker.io/nginx/nginx-ingress", "tag": "1.12.0-ubi" }, "ingressClass": "nginx", "nginxPlus": false, "serviceType": "NodePort", "type": "deployment" } } ] capabilities: Basic Install cni.projectcalico.org/podIP: 172.30.162.209/32 cni.projectcalico.org/podIPs: 172.30.162.209/32 k8s.v1.cni.cncf.io/network-status: [{ "name": "", "ips": [ "172.30.162.209" ], "default": true, "dns": {} }] k8s.v1.cni.cncf.io/networks-status: [{ "name": "", "ips": [ "172.30.162.209" ], "default": true, "dns": {} }] olm.operatorGroup: global-operators olm.operatorNamespace: openshift-operators olm.targetNamespaces: openshift.io/scc: restricted operatorframework.io/properties: {"properties":[{"type":"olm.gvk","value":{"group":"k8s.nginx.org","kind":"NginxIngressController","version":"v1alpha1"}},{"type":"olm.pack... operators.operatorframework.io/builder: operator-sdk-v1.8.0 operators.operatorframework.io/project_layout: go.kubebuilder.io/v3 Status: Running IP: 172.30.162.209 IPs: IP: 172.30.162.209 Controlled By: ReplicaSet/nginx-ingress-operator-controller-manager-7c9d8899f8 Containers: kube-rbac-proxy: Container ID: cri-o://a3b2029f7b667244b6689da14e15cb6ce10ae085579e7912f715197d1727d559 Image: registry.redhat.io/openshift4/ose-kube-rbac-proxy@sha256:6d0286b8a8f6f3cd9d6cd8319400acf27b70fbb52df5808ec6fe2d9849be7d8c Image ID: registry.redhat.io/openshift4/ose-kube-rbac-proxy@sha256:6d0286b8a8f6f3cd9d6cd8319400acf27b70fbb52df5808ec6fe2d9849be7d8c Port: 8443/TCP Host Port: 0/TCP Args: --secure-listen-address=0.0.0.0:8443 --upstream=http://127.0.0.1:8080/ --logtostderr=true --v=10 State: Running Started: Fri, 16 Jul 2021 15:16:20 +0530 Ready: True Restart Count: 0 Environment:
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from nginx-ingress-operator-controller-manager-token-rpmpw (ro)
manager:
Container ID: cri-o://58cb0c4b2705fda2a97f39368c2b48c593c0ad339a35d82d28a5b99decdd4316
Image: registry.connect.redhat.com/nginx/nginx-ingress-operator@sha256:519b5ebc20fa938dab50842a053cedea7dffeec07360ee66c4aac43f1bc63f9f
Image ID: registry.connect.redhat.com/nginx/nginx-ingress-operator@sha256:519b5ebc20fa938dab50842a053cedea7dffeec07360ee66c4aac43f1bc63f9f
Port:
Host Port:
Command:
/manager
Args:
--health-probe-bind-address=:8081
--metrics-bind-address=127.0.0.1:8080
--leader-elect
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Fri, 16 Jul 2021 15:20:01 +0530
Finished: Fri, 16 Jul 2021 15:20:27 +0530
Ready: False
Restart Count: 4
Limits:
cpu: 500m
memory: 128Mi
Requests:
cpu: 250m
memory: 64Mi
Liveness: http-get http://:8081/healthz delay=15s timeout=1s period=20s #success=1 #failure=3
Readiness: http-get http://:8081/readyz delay=5s timeout=1s period=10s #success=1 #failure=3
Environment:
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from nginx-ingress-operator-controller-manager-token-rpmpw (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
nginx-ingress-operator-controller-manager-token-rpmpw:
Type: Secret (a volume populated by a Secret)
SecretName: nginx-ingress-operator-controller-manager-token-rpmpw
Optional: false
QoS Class: Burstable
Node-Selectors:
Tolerations: node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
Normal Scheduled Successfully assigned openshift-operators/nginx-ingress-operator-controller-manager-7c9d8899f8-dkt6b to 10.73.184.244
Normal AddedInterface 4m46s multus Add eth0 [172.30.162.209/32]
Normal Pulled 4m46s kubelet, 10.73.184.244 Container image "registry.redhat.io/openshift4/ose-kube-rbac-proxy@sha256:6d0286b8a8f6f3cd9d6cd8319400acf27b70fbb52df5808ec6fe2d9849be7d8c" already present on machine
Normal Created 4m45s kubelet, 10.73.184.244 Created container kube-rbac-proxy
Normal Started 4m45s kubelet, 10.73.184.244 Started container kube-rbac-proxy
Normal Pulled 2m18s (x4 over 4m45s) kubelet, 10.73.184.244 Container image "registry.connect.redhat.com/nginx/nginx-ingress-operator@sha256:519b5ebc20fa938dab50842a053cedea7dffeec07360ee66c4aac43f1bc63f9f" already present on machine
Normal Created 2m18s (x4 over 4m45s) kubelet, 10.73.184.244 Created container manager
Normal Started 2m18s (x4 over 4m45s) kubelet, 10.73.184.244 Started container manager
Warning Unhealthy 2m13s (x4 over 4m13s) kubelet, 10.73.184.244 Readiness probe failed: Get "http://172.30.162.209:8081/readyz": dial tcp 172.30.162.209:8081: connect: connection refused
Warning BackOff 112s (x6 over 3m36s) kubelet, 10.73.184.244 Back-off restarting failed container
Expected behavior Nginx ingress operator pod should run with restarting
Your environment IBM ROKS 4.6
Additional context Even I tried increasing the Memory to 512 but still it keeps restarting with OOMKILL