resourceVersion changes too often causing status update on kubernetes API

matejkostros commented 1 year ago

We noticed that our Postgres cluster in Kubernetes deployment is showing too many statuses when watching only for changes. Following is snippet of approximately 5 minutes:

[root@vm031 ~]# kubectl get pods --watch-only
NAME                      READY   STATUS    RESTARTS   AGE
kubernetes-postgresql-0   1/1     Running   0          68m
kubernetes-postgresql-0   1/1     Running   0          68m
kubernetes-postgresql-0   1/1     Running   0          68m
kubernetes-postgresql-0   1/1     Running   0          69m
kubernetes-postgresql-0   1/1     Running   0          69m
kubernetes-postgresql-0   1/1     Running   0          69m
kubernetes-postgresql-0   1/1     Running   0          69m
kubernetes-postgresql-0   1/1     Running   0          69m
kubernetes-postgresql-0   1/1     Running   0          69m
kubernetes-postgresql-0   1/1     Running   0          70m
kubernetes-postgresql-0   1/1     Running   0          70m
kubernetes-postgresql-0   1/1     Running   0          70m
kubernetes-postgresql-0   1/1     Running   0          71m
kubernetes-postgresql-0   1/1     Running   0          71m

When we were looking into logs for this pod we found out that these status updates are precisely correlated with INFO messages from the kubernetes-postgresql-0 pod. Specifically status updates during the first info message after LOG message:

2023-02-23 12:10:03.211 26 LOG {ticks: 0, maint: 0, retry: 0}
2023-02-23 12:10:05,032 INFO: no action. I am (kubernetes-postgresql-0) the leader with the lock
2023-02-23 12:10:14,984 INFO: no action. I am (kubernetes-postgresql-0) the leader with the lock
2023-02-23 12:10:25,033 INFO: no action. I am (kubernetes-postgresql-0) the leader with the lock

When we list manifest with kubectl get po/kubernetes-postgresql-0 -o yaml we noticed that only these (manifest also attached: kubernetes-postgresql-0.yaml.txt ) two fields change when status refreshes:

xlog_location in metadata.annotations.status:

metadata:
annotations:
status: '{"conn_url":"postgres://172.16.0.105:5432/postgres","api_url":"http://172.16.0.105:8008/patroni","state":"running","role":"master","version":"2.1.1","xlog_location":285096030656,"timeline":249}'

metadata.resourceVersion:
```
resourceVersion: "164826685"
```
Our kubernetes-postgresql-0 pod is:
Images Used
- docker.io/acid/postgres-operator:v1.8.2
- docker.io/acid/spilo-14:2.1-p3
Where do you run it - cloud or metal? Kubernetes or OpenShift?
- RKE2 on VMware Virtual Machine (single node deployment)
Are you running Postgres Operator in production?
- yes - and issue is reproducible on our dev machines as well
Type of issue?
- bug

We are using postgres-operator, but that is not showing any logs or status updates which would correlate with this issue, that is why I decided to post here.

hughcapet commented 1 year ago

Everything works as expected. Patroni (which is, by the way, what produces the logs you attached) updates the xlog_location (pg_current_wal_lsn()) value (if it has changed) every HA loop. This causes resourceVersion change.

matejkostros commented 1 year ago

Does it mean, that these status changes have purpose which I am missing?

hughcapet commented 1 year ago

maybe. your question is "why should xlog_location be changed?"?

matejkostros commented 1 year ago

Yes. Do you suggest, i should take this under patroni then?

hughcapet commented 1 year ago

The updated value is used for patronictl list output and for GET /async?lag=<max-lag> endpoint. But you of course can create an issue in Patroni repo, I am sure @CyberDem0n will be happy to explain you this in details:)

zalando / spilo

resourceVersion changes too often causing status update on kubernetes API #850