operator-framework / operator-lifecycle-manager

A management framework for extending Kubernetes with Operators
https://olm.operatorframework.io
Apache License 2.0
1.72k stars 545 forks source link

OLM pods are down #2995

Open ShadiAlbatal opened 1 year ago

ShadiAlbatal commented 1 year ago

image

describe one: image image

Suddenly this occurred, some of the pods are missing, as I deleted them for recreation but they did not yet

kevinrizza commented 1 year ago

It looks like the package server (which is a kube aggregated API service) is panicking on startup because the kube apiserver is returning some errors. Can you get the logs of the package server pod before it shuts down?

Also, the package server is bootstrapped as an OLM CSV in the olm namespace. I'm wondering if there is some auth issue. You could try to delete the packageserver CSV and then manually reapply it with the install manifests you used to install OLM.

ShadiAlbatal commented 1 year ago

I tried to get logs of each of the package server pods, but none has any logs.

I install the OLM as instructed in the setup script, but yeah looks like that. image I have some installed operators, does it affect, if i delete the packageserver CSV and redeploy it?

I can see that certmanager has an issue as well image Desc cert-manager-cainjector-7896b5bb4d-7g5ml: image They are all are located on same node, however, the node does not look like having issues: image image image

Worth noting that i have many clusters, but only ones with OLM gets this warning i k9s: image

kevinrizza commented 1 year ago

The packageserver CSV exists as a UX api. It lets you kubectl get packagemanifests so you can see which operators are available on the cluster. It's not necessary to actually lifecycle operators that are already installed, or even install new ones if you know ahead of time which operators are available in the catalogs on your cluster. I doubt this is node related, and most likely there is something about scale or instability with the api server that is making it difficult for the packageserver (which is an aggregated api server) to succesfully start.

ShadiAlbatal commented 1 year ago

Do you have any idea how to workaround it? these pods are on complete status image However, when I describe it, it tells me The node was low on resource: memory. Container packageserver was using 143704Ki, which exceeds its request of 50Mi. image