We have 3 node postgresql on different location, and etcd cluster in one of this location. Postgresql leader lost connection to etcd and replica in etcd locations:
2024-03-22 14:19:22,085 INFO: Lock owner: compute-1; I am compute-1
2024-03-22 14:19:24,088 ERROR: Request to server http://etcd-1.com:2379 failed: ReadTimeoutError("HTTPConnectionPool(host='etcd-1.com', port=2379): Read timed out. (read timeout=1.9998452216386795)",)
2024-03-22 14:19:24,088 INFO: Reconnection allowed, looking for another server.
2024-03-22 14:19:24,088 INFO: Retrying on http://etcd-2.com:2379
2024-03-22 14:19:31,862 ERROR: Error communicating with DCS
2024-03-22 14:19:31,863 ERROR: watchprefix failed: ProtocolError("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))
2024-03-22 14:19:32,042 INFO: Got response from compute-2 http://compute-2:8006/patroni: Accepted
2024-03-22 14:19:35,874 WARNING: Request failed to compute-3: POST http://compute-3:8006/patroni (HTTPConnectionPool(host='compute-3', port=8006): Max retries exceeded with url: /failsafe (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7ff7b270c6d8>, 'Connection to compute-3 timed out. (connect timeout=2)')))
2024-03-22 14:19:35,973 INFO: demoting self because DCS is not accessible and I was a leader
2024-03-22 14:19:35,973 INFO: Demoting self (offline)
But after demoting connection with DCS has been established, and patroni update leader lock until postgres turned off (in our case it take 10 minutes)
2024-03-22 14:19:38,757 INFO: Reconnection allowed, looking for another server.
2024-03-22 14:19:38,757 INFO: Retrying on http://etcd-2.com:2379
2024-03-22 14:19:38,992 INFO: Selected new etcd server http://etcd-2.com:2379
2024-03-22 14:19:39,190 INFO: Lock owner: compute-1; I am compute-1
2024-03-22 14:19:39,628 INFO: updated leader lock during demoting self because DCS is not accessible and I was a leader
2024-03-22 14:19:45,974 INFO: Lock owner: compute-1; I am compute-1
2024-03-22 14:19:46,173 INFO: updated leader lock during demoting self because DCS is not accessible and I was a leader
Looks like bug. Maybe patroni don't update leader lock after demoting because DCS is not accesible?
We have 3 node postgresql on different location, and etcd cluster in one of this location. Postgresql leader lost connection to etcd and replica in etcd locations:
But after demoting connection with DCS has been established, and patroni update leader lock until postgres turned off (in our case it take 10 minutes)
Looks like bug. Maybe patroni don't update leader lock after demoting because DCS is not accesible?