Open j4m3s-s opened 1 year ago
Relevant master logs :
harbor-db-0 postgres 2023-10-03 12:59:40.591 UTC [25] LOG {ticks: 0, maint: 0, retry: 0}
harbor-db-0 postgres 2023-10-03 12:59:44,406 INFO: no action. I am (harbor-db-0), the leader with the lock
harbor-db-0 postgres 2023-10-03 12:59:54,406 INFO: no action. I am (harbor-db-0), the leader with the lock
harbor-db-0 postgres 2023-10-03 13:00:04,465 INFO: no action. I am (harbor-db-0), the leader with the lock
harbor-db-0 postgres 2023-10-03 13:00:10.593 UTC [25] LOG {ticks: 0, maint: 0, retry: 0}
harbor-db-0 postgres 2023-10-03 13:00:12,521 INFO: received failover request with leader=harbor-db-0 candidate=harbor-db-1 scheduled_at=None
harbor-db-0 postgres 2023-10-03 13:00:12,531 INFO: Got response from harbor-db-1 http://10.233.67.152:8008/patroni: {"state": "running", "postmaster_start_time": "2023-10-03 13:00:11.433821+00:00", "role": "replica", "server_version": 130010, "xlog": {"received_location": 20401094656, "replayed_location": 20401094656, "replayed_timestamp": "2023-10-03 13:00:03.787290+00:00", "paused": false}, "timeline": 990, "dcs_last_seen": 1696338012, "database_system_identifier": "7276438021957627967", "patroni": {"version": "3.0.1", "scope": "harbor-db"}}
harbor-db-0 postgres 2023-10-03 13:00:12,630 INFO: Got response from harbor-db-1 http://10.233.67.152:8008/patroni: {"state": "running", "postmaster_start_time": "2023-10-03 13:00:11.433821+00:00", "role": "replica", "server_version": 130010, "xlog": {"received_location": 20401094656, "replayed_location": 20401094656, "replayed_timestamp": "2023-10-03 13:00:03.787290+00:00", "paused": false}, "timeline": 990, "dcs_last_seen": 1696338012, "database_system_identifier": "7276438021957627967", "patroni": {"version": "3.0.1", "scope": "harbor-db"}}
harbor-db-0 postgres 2023-10-03 13:00:12,578 INFO: Lock owner: harbor-db-0; I am harbor-db-0
harbor-db-0 postgres 2023-10-03 13:00:12,684 INFO: manual failover: demoting myself
harbor-db-0 postgres 2023-10-03 13:00:12,685 INFO: Demoting self (graceful)
harbor-db-0 postgres 2023-10-03 13:00:13,948 INFO: Leader key released
harbor-db-0 postgres 2023-10-03 13:00:13,952 INFO: Lock owner: harbor-db-1; I am harbor-db-0
harbor-db-0 postgres 2023-10-03 13:00:13,952 INFO: manual failover: demote in progress
harbor-db-0 postgres 2023-10-03 13:00:13,953 INFO: Lock owner: harbor-db-1; I am harbor-db-0
harbor-db-0 postgres 2023-10-03 13:00:13,953 INFO: manual failover: demote in progress
harbor-db-0 postgres 2023-10-03 13:00:14,096 INFO: Lock owner: harbor-db-1; I am harbor-db-0
harbor-db-0 postgres 2023-10-03 13:00:14,096 INFO: manual failover: demote in progress
harbor-db-0 postgres 2023-10-03 13:00:15,105 INFO: Lock owner: harbor-db-1; I am harbor-db-0
harbor-db-0 postgres 2023-10-03 13:00:15,105 INFO: manual failover: demote in progress
harbor-db-0 postgres 2023-10-03 13:00:16 UTC [740]: [1-1] 651c1060.2e4 0 LOG: Auto detecting pg_stat_kcache.linux_hz parameter...
harbor-db-0 postgres 2023-10-03 13:00:16 UTC [740]: [2-1] 651c1060.2e4 0 LOG: pg_stat_kcache.linux_hz is set to 1000000
harbor-db-0 postgres 2023-10-03 13:00:16 UTC [740]: [3-1] 651c1060.2e4 0 LOG: redirecting log output to logging collector process
harbor-db-0 postgres 2023-10-03 13:00:16 UTC [740]: [4-1] 651c1060.2e4 0 HINT: Future log output will appear in directory "../pg_log".
harbor-db-0 postgres /var/run/postgresql:5432 - no response
harbor-db-0 postgres /var/run/postgresql:5432 - accepting connections
harbor-db-0 postgres /var/run/postgresql:5432 - accepting connections
harbor-db-0 postgres /etc/runit/runsvdir/default/patroni: finished with code=0 signal=0
harbor-db-0 postgres stopping /etc/runit/runsvdir/default/patroni
harbor-db-0 postgres timeout: finish: .: (pid 756) 1815s, want down
harbor-db-0 postgres ok: down: patroni: 0s, normally up
harbor-db-0 postgres ok: down: /etc/service/patroni: 0s, normally up
harbor-db-0 postgres 2023-10-03 13:00:24.881 UTC [25] LOG Got SIGTERM, fast exit
harbor-db-0 postgres ok: down: /etc/service/pgqd: 1s, normally up
Hi there,
Same problem for me. I have a mutating webhook which changes the image of the container in the pod. The operator wants to sync the pod to the original state rescheduling it every 30m. Is there a way to avoid this situation?
Thanks
This can be solved by changing either the default image of your pods in operatorconfigurations CRD or in your postgresql CRD. The mutation webhook should then ignore resources that already have the correct image registry configured. Also keep in mind that you have to change image name of every sidecar container as well.
Please, answer some short questions which should help us to understand your problem / question better?
Every 10/30 minutes my DB pods are restarted which make the DB unavailable for a couple seconds. I think I've traced the issue to the way the operator compares the pod in the cluster and what it expects to find. I have a software running in my cluster that patches pods images to redirect it to a caching proxy. (say docker.io/library/toto -> mycachingproxy/docker.io/library/toto) which I think causes this.
So my question is : is my analysis correct ? I'd be glad to post a MR to fix this issue if that's the case.
Thanks by advance.
Logs from the operator :