zalando / spilo

Highly available elephant herd: HA PostgreSQL cluster using Docker
Apache License 2.0
1.53k stars 382 forks source link

resourceVersion changes too often causing status update on kubernetes API #850

Closed matejkostros closed 1 year ago

matejkostros commented 1 year ago

We noticed that our Postgres cluster in Kubernetes deployment is showing too many statuses when watching only for changes. Following is snippet of approximately 5 minutes:

[root@vm031 ~]# kubectl get pods --watch-only
NAME                      READY   STATUS    RESTARTS   AGE
kubernetes-postgresql-0   1/1     Running   0          68m
kubernetes-postgresql-0   1/1     Running   0          68m
kubernetes-postgresql-0   1/1     Running   0          68m
kubernetes-postgresql-0   1/1     Running   0          69m
kubernetes-postgresql-0   1/1     Running   0          69m
kubernetes-postgresql-0   1/1     Running   0          69m
kubernetes-postgresql-0   1/1     Running   0          69m
kubernetes-postgresql-0   1/1     Running   0          69m
kubernetes-postgresql-0   1/1     Running   0          69m
kubernetes-postgresql-0   1/1     Running   0          70m
kubernetes-postgresql-0   1/1     Running   0          70m
kubernetes-postgresql-0   1/1     Running   0          70m
kubernetes-postgresql-0   1/1     Running   0          71m
kubernetes-postgresql-0   1/1     Running   0          71m

When we were looking into logs for this pod we found out that these status updates are precisely correlated with INFO messages from the kubernetes-postgresql-0 pod. Specifically status updates during the first info message after LOG message:

2023-02-23 12:10:03.211 26 LOG {ticks: 0, maint: 0, retry: 0}
2023-02-23 12:10:05,032 INFO: no action. I am (kubernetes-postgresql-0) the leader with the lock
2023-02-23 12:10:14,984 INFO: no action. I am (kubernetes-postgresql-0) the leader with the lock
2023-02-23 12:10:25,033 INFO: no action. I am (kubernetes-postgresql-0) the leader with the lock

When we list manifest with kubectl get po/kubernetes-postgresql-0 -o yaml we noticed that only these (manifest also attached: kubernetes-postgresql-0.yaml.txt ) two fields change when status refreshes:

We are using postgres-operator, but that is not showing any logs or status updates which would correlate with this issue, that is why I decided to post here.

hughcapet commented 1 year ago

Everything works as expected. Patroni (which is, by the way, what produces the logs you attached) updates the xlog_location (pg_current_wal_lsn()) value (if it has changed) every HA loop. This causes resourceVersion change.

matejkostros commented 1 year ago

Does it mean, that these status changes have purpose which I am missing?

hughcapet commented 1 year ago

maybe. your question is "why should xlog_location be changed?"?

matejkostros commented 1 year ago

Yes. Do you suggest, i should take this under patroni then?

hughcapet commented 1 year ago

The updated value is used for patronictl list output and for GET /async?lag=<max-lag> endpoint. But you of course can create an issue in Patroni repo, I am sure @CyberDem0n will be happy to explain you this in details:)