Postgresql clusters restarting every sync interval

Aslan-Liu commented 1 year ago

Which image of the operator are you using? postgres-operator:v1.8.2
Where do you run it - cloud or metal? Kubernetes or OpenShift? K8S
Are you running Postgres Operator in production? [yes | no] yes
Type of issue? [Bug report, question, feature request, etc.] Question

My postgres cluster is restarted every 30 minutes. I check the logs from operator and see...

time="2023-10-20T13:47:07Z" level=info msg="SYNC event has been queued" cluster-name=srv-netbox/netbox-postgres pkg=controller worker=0
time="2023-10-20T13:47:07Z" level=info msg="there are 1 clusters running" pkg=controller
time="2023-10-20T13:47:07Z" level=info msg="syncing of the cluster started" cluster-name=srv-netbox/netbox-postgres pkg=controller worker=0
time="2023-10-20T13:47:07Z" level=debug msg="team API is disabled" cluster-name=srv-netbox/netbox-postgres pkg=cluster
time="2023-10-20T13:47:07Z" level=debug msg="team API is disabled" cluster-name=srv-netbox/netbox-postgres pkg=cluster
time="2023-10-20T13:47:07Z" level=info msg="syncing secrets" cluster-name=srv-netbox/netbox-postgres pkg=cluster
time="2023-10-20T13:47:07Z" level=debug msg="syncing master service" cluster-name=srv-netbox/netbox-postgres pkg=cluster
time="2023-10-20T13:47:07Z" level=debug msg="syncing replica service" cluster-name=srv-netbox/netbox-postgres pkg=cluster
time="2023-10-20T13:47:07Z" level=debug msg="syncing volumes using \"pvc\" storage resize mode" cluster-name=srv-netbox/netbox-postgres pkg=cluster
time="2023-10-20T13:47:08Z" level=info msg="volume claims do not require changes" cluster-name=srv-netbox/netbox-postgres pkg=cluster
time="2023-10-20T13:47:08Z" level=debug msg="syncing statefulsets" cluster-name=srv-netbox/netbox-postgres pkg=cluster
time="2023-10-20T13:47:08Z" level=debug msg="mark rolling update annotation for netbox-postgres-0: reason pod not yet restarted due to lazy update" cluster-name=srv-netbox/netbox-postgres pkg=cluster
time="2023-10-20T13:47:08Z" level=debug msg="mark rolling update annotation for netbox-postgres-1: reason pod not yet restarted due to lazy update" cluster-name=srv-netbox/netbox-postgres pkg=cluster
time="2023-10-20T13:47:08Z" level=debug msg="mark rolling update annotation for netbox-postgres-2: reason pod not yet restarted due to lazy update" cluster-name=srv-netbox/netbox-postgres pkg=cluster
time="2023-10-20T13:47:09Z" level=debug msg="making GET http request: http://192.168.116.48:8008/config" cluster-name=srv-netbox/netbox-postgres pkg=cluster
time="2023-10-20T13:47:21Z" level=debug msg="making GET http request: http://192.168.117.82:8008/patroni" cluster-name=srv-netbox/netbox-postgres pkg=cluster
time="2023-10-20T13:47:21Z" level=debug msg="making GET http request: http://192.168.95.142:8008/patroni" cluster-name=srv-netbox/netbox-postgres pkg=cluster
time="2023-10-20T13:47:21Z" level=debug msg="making GET http request: http://192.168.116.48:8008/patroni" cluster-name=srv-netbox/netbox-postgres pkg=cluster
time="2023-10-20T13:47:21Z" level=debug msg="performing rolling update" cluster-name=srv-netbox/netbox-postgres pkg=cluster
time="2023-10-20T13:47:21Z" level=info msg="there are 3 pods in the cluster to recreate" cluster-name=srv-netbox/netbox-postgres pkg=cluster
time="2023-10-20T13:47:21Z" level=debug msg="subscribing to pod \"srv-netbox/netbox-postgres-1\"" cluster-name=srv-netbox/netbox-postgres pkg=cluster
time="2023-10-20T13:47:49Z" level=info msg="pod \"srv-netbox/netbox-postgres-1\" has been recreated" cluster-name=srv-netbox/netbox-postgres pkg=cluster
time="2023-10-20T13:47:49Z" level=debug msg="unsubscribing from pod \"srv-netbox/netbox-postgres-1\" events" cluster-name=srv-netbox/netbox-postgres pkg=cluster
time="2023-10-20T13:47:49Z" level=debug msg="subscribing to pod \"srv-netbox/netbox-postgres-2\"" cluster-name=srv-netbox/netbox-postgres pkg=cluster

Actually, I don't know why it keeps restarting my cluster. How do I check it?

Aslan-Liu commented 1 year ago

I may find the reason. It's due to my error settings. Sorry.

a1994sc commented 8 months ago

Do you mind sharing what your error was? My pods restart every half hour or so

Aslan-Liu commented 8 months ago

Actually, all images url of pods in my cluster will be modifed or replaced to a specified repository (due to security concerns). Therefore, operator detect image urls are different between deployment.yaml and pod.yaml and restart pods every 30 minutes.

a1994sc commented 8 months ago

Ooooh, gotcha, I guess you ended up having the images reflected the private repo?

Aslan-Liu commented 8 months ago

Ooooh, gotcha, I guess you ended up having the images reflected the private repo?

Yes, you are correct.

zalando / postgres-operator

Postgresql clusters restarting every sync interval #2453