Multiple primaries in Kubernetes during node failure and network partition

wb14123 commented 5 days ago

What happened?

When run a Jepsen test with Patroni (https://github.com/wb14123/jepsen-postgres-ha), it observes multiple primaries through the tests, when some node failure and network partition events happened.

The test setup a 3 nodes Kubernetes cluster with k3s, and deploy it with this yaml file. Then it randomly takes down the primary node and create network partition. At the same time it tries to get primary nodes using kubectl get pods -L role -o wide.

The test itself will report failure at the end. However, just giving an example here, here is how the cluster ended in one of my test:

kubectl get pods -L role -o wide
NAME            READY   STATUS    RESTARTS        AGE     IP          NODE                NOMINATED NODE   READINESS GATES   ROLE
patronidemo-0   1/1     Running   1 (4m27s ago)   9m47s   10.42.2.3   postgres-server-3   <none>           <none>            primary
patronidemo-1   1/1     Running   0               9m12s   10.42.1.2   postgres-server-2   <none>           <none>            primary
patronidemo-2   1/1     Running   1 (4m28s ago)   8m52s   10.42.0.8   postgres-server-1   <none>           <none>            replica

How can we reproduce it (as minimally and precisely as possible)?

Download the project here. Install requirements. Follow here to setup Vagrant VMs. And run the command:

lein run test --nodes-file ./nodes --username vagrant -w check-primary --concurrency 1 --isolation serializable --nemesis partition,kill  --time-limit 600 -r 1 --max-writes-per-key 16 --nemesis-interval 120 --cluster patroni

This will run the tests for 10 minutes. The test will import node failures and network partition for primary nodes.

What did you expect to happen?

There should be at most 1 primary nodes at any given time.

Patroni/PostgreSQL/DCS version

Using the docker image built from commit 969d7ec4, using the Dockerfile.

Patroni version: v4.0.3
PostgreSQL version: 16.
DCS (and its version):

Patroni configuration file

It is deployed through Kubernetes using this yaml file: https://github.com/wb14123/jepsen-postgres-ha/blob/master/cluster/patroni/k8s.yaml

patronictl show-config

postgres@patronidemo-0:~$ patronictl show-config
postgresql:
  pg_hba:
  - host all all 0.0.0.0/0 md5
  - host replication standby 10.42.0.10/16 md5
  - host replication standby 127.0.0.1/32 md5
  use_pg_rewind: true

Patroni log files

# logs are from `kubectl logs ...` so the logs before the node crash may lost.

# node 0:

2024-10-26 03:35:01,883 INFO: Selected new K8s API server endpoint https://192.168.56.3:6443
2024-10-26 03:35:08,157 INFO: No PostgreSQL configuration items changed, nothing to reload.
2024-10-26 03:35:08,165 INFO: Reaped pid=12, exit status=0
2024-10-26 03:35:08,171 INFO: Reaped pid=15, exit status=0
2024-10-26 03:35:08,171 WARNING: Postgresql is not running.
2024-10-26 03:35:08,172 INFO: Lock owner: patronidemo-1; I am patronidemo-0
2024-10-26 03:35:08,183 INFO: Reaped pid=16, exit status=0
2024-10-26 03:35:08,184 INFO: pg_controldata:
  pg_control version number: 1300
  Catalog version number: 202307071
  Database system identifier: 7429921490361479191
  Database cluster state: in production
  pg_control last modified: Sat Oct 26 03:30:16 2024
  Latest checkpoint location: 0/3000060
  Latest checkpoint's REDO location: 0/3000028
  Latest checkpoint's REDO WAL file: 000000010000000000000003
  Latest checkpoint's TimeLineID: 1
  Latest checkpoint's PrevTimeLineID: 1
  Latest checkpoint's full_page_writes: on
  Latest checkpoint's NextXID: 0:733
  Latest checkpoint's NextOID: 24576
  Latest checkpoint's NextMultiXactId: 1
  Latest checkpoint's NextMultiOffset: 0
  Latest checkpoint's oldestXID: 723
  Latest checkpoint's oldestXID's DB: 1
  Latest checkpoint's oldestActiveXID: 733
  Latest checkpoint's oldestMultiXid: 1
  Latest checkpoint's oldestMulti's DB: 1
  Latest checkpoint's oldestCommitTsXid: 0
  Latest checkpoint's newestCommitTsXid: 0
  Time of latest checkpoint: Sat Oct 26 03:30:16 2024
  Fake LSN counter for unlogged rels: 0/3E8
  Minimum recovery ending location: 0/0
  Min recovery ending loc's timeline: 0
  Backup start location: 0/0
  Backup end location: 0/0
  End-of-backup record required: no
  wal_level setting: replica
  wal_log_hints setting: on
  max_connections setting: 100
  max_worker_processes setting: 8
  max_wal_senders setting: 10
  max_prepared_xacts setting: 0
  max_locks_per_xact setting: 64
  track_commit_timestamp setting: off
  Maximum data alignment: 8
  Database block size: 8192
  Blocks per segment of large relation: 131072
  WAL block size: 8192
  Bytes per WAL segment: 16777216
  Maximum length of identifiers: 64
  Maximum columns in an index: 32
  Maximum size of a TOAST chunk: 1996
  Size of a large-object chunk: 2048
  Date/time type storage: 64-bit integers
  Float8 argument passing: by value
  Data page checksum version: 1
  Mock authentication nonce: e71f8193c5240fa7ec520c71dd6b0948d7af291a6e839c91faea92c067b7ef1d

2024-10-26 03:35:08,209 INFO: Reaped pid=18, exit status=0
2024-10-26 03:35:08,289 INFO: doing crash recovery in a single user mode
2024-10-26 03:35:08,289 INFO: Lock owner: patronidemo-1; I am patronidemo-0
2024-10-26 03:35:08,290 INFO: doing crash recovery in a single user mode in progress
2024-10-26 03:35:08,682 WARNING: Postgresql is not running.
2024-10-26 03:35:08,682 INFO: Lock owner: patronidemo-1; I am patronidemo-0
2024-10-26 03:35:08,685 INFO: Reaped pid=24, exit status=0
2024-10-26 03:35:08,685 INFO: pg_controldata:
  pg_control version number: 1300
  Catalog version number: 202307071
  Database system identifier: 7429921490361479191
  Database cluster state: shut down
  pg_control last modified: Sat Oct 26 03:35:08 2024
  Latest checkpoint location: 0/5000028
  Latest checkpoint's REDO location: 0/5000028
  Latest checkpoint's REDO WAL file: 000000010000000000000005
  Latest checkpoint's TimeLineID: 1
  Latest checkpoint's PrevTimeLineID: 1
  Latest checkpoint's full_page_writes: on
  Latest checkpoint's NextXID: 0:744
  Latest checkpoint's NextOID: 24576
  Latest checkpoint's NextMultiXactId: 1
  Latest checkpoint's NextMultiOffset: 0
  Latest checkpoint's oldestXID: 723
  Latest checkpoint's oldestXID's DB: 1
  Latest checkpoint's oldestActiveXID: 0
  Latest checkpoint's oldestMultiXid: 1
  Latest checkpoint's oldestMulti's DB: 1
  Latest checkpoint's oldestCommitTsXid: 0
  Latest checkpoint's newestCommitTsXid: 0
  Time of latest checkpoint: Sat Oct 26 03:35:08 2024
  Fake LSN counter for unlogged rels: 0/3E8
  Minimum recovery ending location: 0/0
  Min recovery ending loc's timeline: 0
  Backup start location: 0/0
  Backup end location: 0/0
  End-of-backup record required: no
  wal_level setting: replica
  wal_log_hints setting: on
  max_connections setting: 100
  max_worker_processes setting: 8
  max_wal_senders setting: 10
  max_prepared_xacts setting: 0
  max_locks_per_xact setting: 64
  track_commit_timestamp setting: off
  Maximum data alignment: 8
  Database block size: 8192
  Blocks per segment of large relation: 131072
  WAL block size: 8192
  Bytes per WAL segment: 16777216
  Maximum length of identifiers: 64
  Maximum columns in an index: 32
  Maximum size of a TOAST chunk: 1996
  Size of a large-object chunk: 2048
  Date/time type storage: 64-bit integers
  Float8 argument passing: by value
  Data page checksum version: 1
  Mock authentication nonce: e71f8193c5240fa7ec520c71dd6b0948d7af291a6e839c91faea92c067b7ef1d

2024-10-26 03:35:08,685 INFO: Lock owner: patronidemo-1; I am patronidemo-0
2024-10-26 03:35:08,694 INFO: Reaped pid=26, exit status=0
2024-10-26 03:35:08,696 INFO: Reaped pid=27, exit status=0
2024-10-26 03:35:08,696 INFO: Local timeline=1 lsn=0/5000028
2024-10-26 03:35:08,708 INFO: primary_timeline=2
2024-10-26 03:35:08,709 INFO: primary: history=1    0/404D580   no recovery target specified
2024-10-26 03:35:08,717 INFO: Reaped pid=29, exit status=0
2024-10-26 03:35:08,718 INFO: Reaped pid=30, exit status=0
2024-10-26 03:35:08,719 INFO: running pg_rewind from patronidemo-1
2024-10-26 03:35:08,761 INFO: running pg_rewind from dbname=postgres user=postgres host=10.42.1.3 port=5432 target_session_attrs=read-write
2024-10-26 03:35:10,356 INFO: pg_rewind exit code=0
2024-10-26 03:35:10,356 INFO:  stdout=
2024-10-26 03:35:10,356 INFO:  stderr=pg_rewind: servers diverged at WAL location 0/404D580 on timeline 1
pg_rewind: rewinding from last common checkpoint at 0/3000060 on timeline 1
pg_rewind: Done!

2024-10-26 03:35:10,357 WARNING: Postgresql is not running.
2024-10-26 03:35:10,357 INFO: Lock owner: patronidemo-1; I am patronidemo-0
2024-10-26 03:35:10,358 INFO: Reaped pid=36, exit status=0
2024-10-26 03:35:10,359 INFO: pg_controldata:
  pg_control version number: 1300
  Catalog version number: 202307071
  Database system identifier: 7429921490361479191
  Database cluster state: in archive recovery
  pg_control last modified: Sat Oct 26 03:35:10 2024
  Latest checkpoint location: 0/404D6D0
  Latest checkpoint's REDO location: 0/404D698
  Latest checkpoint's REDO WAL file: 000000020000000000000004
  Latest checkpoint's TimeLineID: 2
  Latest checkpoint's PrevTimeLineID: 2
  Latest checkpoint's full_page_writes: on
  Latest checkpoint's NextXID: 0:744
  Latest checkpoint's NextOID: 24576
  Latest checkpoint's NextMultiXactId: 1
  Latest checkpoint's NextMultiOffset: 0
  Latest checkpoint's oldestXID: 723
  Latest checkpoint's oldestXID's DB: 1
  Latest checkpoint's oldestActiveXID: 744
  Latest checkpoint's oldestMultiXid: 1
  Latest checkpoint's oldestMulti's DB: 1
  Latest checkpoint's oldestCommitTsXid: 0
  Latest checkpoint's newestCommitTsXid: 0
  Time of latest checkpoint: Sat Oct 26 03:33:16 2024
  Fake LSN counter for unlogged rels: 0/3E8
  Minimum recovery ending location: 0/404D780
  Min recovery ending loc's timeline: 2
  Backup start location: 0/0
  Backup end location: 0/0
  End-of-backup record required: no
  wal_level setting: replica
  wal_log_hints setting: on
  max_connections setting: 100
  max_worker_processes setting: 8
  max_wal_senders setting: 10
  max_prepared_xacts setting: 0
  max_locks_per_xact setting: 64
  track_commit_timestamp setting: off
  Maximum data alignment: 8
  Database block size: 8192
  Blocks per segment of large relation: 131072
  WAL block size: 8192
  Bytes per WAL segment: 16777216
  Maximum length of identifiers: 64
  Maximum columns in an index: 32
  Maximum size of a TOAST chunk: 1996
  Size of a large-object chunk: 2048
  Date/time type storage: 64-bit integers
  Float8 argument passing: by value
  Data page checksum version: 1
  Mock authentication nonce: e71f8193c5240fa7ec520c71dd6b0948d7af291a6e839c91faea92c067b7ef1d

2024-10-26 03:35:10,359 INFO: Lock owner: patronidemo-1; I am patronidemo-0
2024-10-26 03:35:10,359 INFO: starting as a secondary
2024-10-26 03:35:10.921 UTC [42] LOG:  starting PostgreSQL 16.4 (Debian 16.4-1.pgdg120+2) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
2024-10-26 03:35:10.921 UTC [42] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2024-10-26 03:35:10.925 UTC [42] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2024-10-26 03:35:10,938 INFO: postmaster pid=42
2024-10-26 03:35:10.939 UTC [45] LOG:  database system was interrupted while in recovery at log time 2024-10-26 03:33:16 UTC
2024-10-26 03:35:10.939 UTC [45] HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
2024-10-26 03:35:11.007 UTC [47] FATAL:  the database system is starting up
localhost:5432 - rejecting connections
2024-10-26 03:35:11.059 UTC [45] LOG:  entering standby mode
2024-10-26 03:35:11.059 UTC [45] LOG:  starting backup recovery with redo LSN 0/3000028, checkpoint LSN 0/3000060, on timeline ID 1
2024-10-26 03:35:11.062 UTC [45] LOG:  redo starts at 0/3000028
2024-10-26 03:35:11.063 UTC [45] LOG:  completed backup recovery with redo LSN 0/3000028 and end LSN 0/3000100
2024-10-26 03:35:11.064 UTC [45] LOG:  consistent recovery state reached at 0/404D780
2024-10-26 03:35:11.064 UTC [45] LOG:  invalid record length at 0/404D780: expected at least 24, got 0
2024-10-26 03:35:11.064 UTC [42] LOG:  database system is ready to accept read-only connections
2024-10-26 03:35:11.076 UTC [49] LOG:  started streaming WAL from primary at 0/4000000 on timeline 2
localhost:5432 - accepting connections
2024-10-26 03:35:11,092 INFO: Lock owner: patronidemo-1; I am patronidemo-0
2024-10-26 03:35:11,092 INFO: establishing a new patroni heartbeat connection to postgres
2024-10-26 03:35:11,197 INFO: no action. I am (patronidemo-0), a secondary, and following a leader (patronidemo-1)
2024-10-26 03:35:16,416 INFO: no action. I am (patronidemo-0), a secondary, and following a leader (patronidemo-1)
2024-10-26 03:35:17,929 ERROR: ObjectCache.run ProtocolError("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))
2024-10-26 03:35:17,931 ERROR: ObjectCache.run ProtocolError("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))
2024-10-26 03:35:18,110 ERROR: Request to server https://192.168.56.3:6443 failed: MaxRetryError("HTTPSConnectionPool(host='192.168.56.3', port=6443): Max retries exceeded with url: /api/v1/namespaces/default/endpoints?labelSelector=application%3Dpatroni%2Ccluster-name%3Dpatronidemo (Caused by ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')))")
2024-10-26 03:35:18,110 INFO: Retrying on https://192.168.56.2:6443
2024-10-26 03:35:18,120 ERROR: Request to server https://192.168.56.3:6443 failed: MaxRetryError("HTTPSConnectionPool(host='192.168.56.3', port=6443): Max retries exceeded with url: /api/v1/namespaces/default/pods?labelSelector=application%3Dpatroni%2Ccluster-name%3Dpatronidemo (Caused by ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')))")
2024-10-26 03:35:18,120 INFO: Retrying on https://192.168.56.2:6443
2024-10-26 03:35:18.216 UTC [49] FATAL:  could not receive data from WAL stream: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
2024-10-26 03:35:18.249 UTC [55] FATAL:  could not connect to the primary server: connection to server at "10.42.1.3", port 5432 failed: Connection refused
        Is the server running on that host and accepting TCP/IP connections?
2024-10-26 03:35:18.250 UTC [45] LOG:  waiting for WAL to become available at 0/4002000
2024-10-26 03:35:21,452 ERROR: Request to server https://192.168.56.2:6443 failed: MaxRetryError('HTTPSConnectionPool(host=\'192.168.56.2\', port=6443): Max retries exceeded with url: /api/v1/namespaces/default/endpoints?labelSelector=application%3Dpatroni%2Ccluster-name%3Dpatronidemo (Caused by ReadTimeoutError("HTTPSConnectionPool(host=\'192.168.56.2\', port=6443): Read timed out. (read timeout=3.2898557833332993)"))')
2024-10-26 03:35:21,452 INFO: Retrying on https://192.168.56.4:6443
2024-10-26 03:35:21,471 ERROR: Request to server https://192.168.56.2:6443 failed: MaxRetryError('HTTPSConnectionPool(host=\'192.168.56.2\', port=6443): Max retries exceeded with url: /api/v1/namespaces/default/pods?labelSelector=application%3Dpatroni%2Ccluster-name%3Dpatronidemo (Caused by ReadTimeoutError("HTTPSConnectionPool(host=\'192.168.56.2\', port=6443): Read timed out. (read timeout=3.3005952073334206)"))')
2024-10-26 03:35:21,471 INFO: Retrying on https://192.168.56.4:6443
2024-10-26 03:35:23.233 UTC [56] FATAL:  could not connect to the primary server: connection to server at "10.42.1.3", port 5432 failed: Connection refused
        Is the server running on that host and accepting TCP/IP connections?
2024-10-26 03:35:23.234 UTC [45] LOG:  waiting for WAL to become available at 0/4002000
2024-10-26 03:35:24,789 ERROR: Request to server https://192.168.56.4:6443 failed: MaxRetryError('HTTPSConnectionPool(host=\'192.168.56.4\', port=6443): Max retries exceeded with url: /api/v1/namespaces/default/endpoints?labelSelector=application%3Dpatroni%2Ccluster-name%3Dpatronidemo (Caused by ReadTimeoutError("HTTPSConnectionPool(host=\'192.168.56.4\', port=6443): Read timed out. (read timeout=3.3016312423333907)"))')
2024-10-26 03:35:24,811 ERROR: Request to server https://192.168.56.4:6443 failed: MaxRetryError('HTTPSConnectionPool(host=\'192.168.56.4\', port=6443): Max retries exceeded with url: /api/v1/namespaces/default/pods?labelSelector=application%3Dpatroni%2Ccluster-name%3Dpatronidemo (Caused by ReadTimeoutError("HTTPSConnectionPool(host=\'192.168.56.4\', port=6443): Read timed out. (read timeout=3.3174746023333532)"))')
2024-10-26 03:35:26,193 ERROR: ObjectCache.run K8sConnectionFailed('No more API server nodes in the cluster')
2024-10-26 03:35:26,196 WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=0, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f0a51885850>: Failed to establish a new connection: [Errno 111] Connection refused')': /api/v1/namespaces/default/endpoints/kubernetes
2024-10-26 03:35:26,199 ERROR: ObjectCache.run K8sConnectionFailed('No more API server nodes in the cluster')
2024-10-26 03:35:26,234 ERROR: Request to server https://192.168.56.3:6443 failed: MaxRetryError("HTTPSConnectionPool(host='192.168.56.3', port=6443): Max retries exceeded with url: /api/v1/namespaces/default/pods?labelSelector=application%3Dpatroni%2Ccluster-name%3Dpatronidemo (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f0a51885e10>: Failed to establish a new connection: [Errno 111] Connection refused'))")
2024-10-26 03:35:26,234 INFO: Retrying on https://192.168.56.4:6443
2024-10-26 03:35:26,246 ERROR: Request to server https://192.168.56.3:6443 failed: MaxRetryError("HTTPSConnectionPool(host='192.168.56.3', port=6443): Max retries exceeded with url: /api/v1/namespaces/default/endpoints?labelSelector=application%3Dpatroni%2Ccluster-name%3Dpatronidemo (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f0a5187c810>: Failed to establish a new connection: [Errno 111] Connection refused'))")
2024-10-26 03:35:26,246 INFO: Retrying on https://192.168.56.4:6443
2024-10-26 03:35:26,299 INFO: Selected new K8s API server endpoint https://192.168.56.4:6443
2024-10-26 03:35:26,308 INFO: Selected new K8s API server endpoint https://192.168.56.4:6443
2024-10-26 03:35:26,943 INFO: no action. I am (patronidemo-0), a secondary, and following a leader (patronidemo-1)
2024-10-26 03:35:28.230 UTC [59] FATAL:  could not connect to the primary server: connection to server at "10.42.1.3", port 5432 failed: Connection refused
        Is the server running on that host and accepting TCP/IP connections?
2024-10-26 03:35:28.230 UTC [45] LOG:  waiting for WAL to become available at 0/4002000
2024-10-26 03:35:33.243 UTC [60] FATAL:  could not connect to the primary server: connection to server at "10.42.1.3", port 5432 failed: Connection refused
        Is the server running on that host and accepting TCP/IP connections?
2024-10-26 03:35:33.243 UTC [45] LOG:  waiting for WAL to become available at 0/4002000
2024-10-26 03:35:36,944 INFO: no action. I am (patronidemo-0), a secondary, and following a leader (patronidemo-1)
2024-10-26 03:35:38.241 UTC [62] FATAL:  could not connect to the primary server: connection to server at "10.42.1.3", port 5432 failed: Connection refused
        Is the server running on that host and accepting TCP/IP connections?
2024-10-26 03:35:38.242 UTC [45] LOG:  waiting for WAL to become available at 0/4002000
2024-10-26 03:35:43.238 UTC [63] FATAL:  could not connect to the primary server: connection to server at "10.42.1.3", port 5432 failed: Connection refused
        Is the server running on that host and accepting TCP/IP connections?
2024-10-26 03:35:43.240 UTC [45] LOG:  waiting for WAL to become available at 0/4002000
2024-10-26 03:35:46,920 INFO: establishing a new patroni restapi connection to postgres
2024-10-26 03:35:46,936 WARNING: Request failed to patronidemo-1: GET http://10.42.1.3:8008/patroni (HTTPConnectionPool(host='10.42.1.3', port=8008): Max retries exceeded with url: /patroni (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f0a518d3bd0>: Failed to establish a new connection: [Errno 111] Connection refused')))
2024-10-26 03:35:46,962 INFO: Got response from patronidemo-2 http://10.42.2.6:8008/patroni: {"state": "running", "postmaster_start_time": "2024-10-26 03:31:17.708861+00:00", "role": "replica", "server_version": 160004, "xlog": {"received_location": 67426176, "replayed_location": 67426176, "replayed_timestamp": "2024-10-26 03:31:34.656102+00:00", "paused": false}, "timeline": 2, "cluster_unlocked": true, "dcs_last_seen": 1729913746, "database_system_identifier": "7429921490361479191", "patroni": {"version": "4.0.3", "scope": "patronidemo", "name": "patronidemo-2"}}
2024-10-26 03:35:47,009 WARNING: Could not activate Linux watchdog device: Can't open watchdog device: [Errno 2] No such file or directory: '/dev/watchdog'
2024-10-26 03:35:47,067 INFO: promoted self to leader by acquiring session lock
server promoting
2024-10-26 03:35:47.075 UTC [45] LOG:  received promote request
2024-10-26 03:35:47.075 UTC [45] LOG:  redo done at 0/404D748 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 36.01 s
2024-10-26 03:35:47.075 UTC [45] LOG:  last completed transaction was at log time 2024-10-26 03:31:34.656102+00
2024-10-26 03:35:47.080 UTC [45] LOG:  selected new timeline ID: 3
2024-10-26 03:35:47.152 UTC [45] LOG:  archive recovery complete
2024-10-26 03:35:47.155 UTC [43] LOG:  checkpoint starting: force
2024-10-26 03:35:47.158 UTC [42] LOG:  database system is ready to accept connections
2024-10-26 03:35:48,096 INFO: Lock owner: patronidemo-0; I am patronidemo-0
2024-10-26 03:35:48,115 INFO: Reaped pid=82, exit status=0
2024-10-26 03:35:48.136 UTC [43] LOG:  checkpoint complete: wrote 43 buffers (0.3%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.953 s, sync=0.005 s, total=0.981 s; sync files=10, longest=0.003 s, average=0.001 s; distance=16693 kB, estimate=16693 kB; lsn=0/404D820, redo lsn=0/404D7B0
2024-10-26 03:35:48.136 UTC [43] LOG:  checkpoint starting: immediate force wait
2024-10-26 03:35:48.144 UTC [43] LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.009 s; sync files=0, longest=0.000 s, average=0.000 s; distance=0 kB, estimate=15024 kB; lsn=0/404D8D0, redo lsn=0/404D898
2024-10-26 03:35:48,163 INFO: no action. I am (patronidemo-0), the leader with the lock
...

# node 1:

2024-10-26 03:38:20,328 INFO: Selected new K8s API server endpoint https://192.168.56.2:6443
2024-10-26 03:38:26,501 INFO: No PostgreSQL configuration items changed, nothing to reload.
2024-10-26 03:38:26,504 INFO: Reaped pid=11, exit status=0
2024-10-26 03:38:26,509 INFO: Reaped pid=14, exit status=0
2024-10-26 03:38:26,510 WARNING: Postgresql is not running.
2024-10-26 03:38:26,510 INFO: Lock owner: patronidemo-0; I am patronidemo-1
2024-10-26 03:38:26,511 INFO: Reaped pid=15, exit status=0
2024-10-26 03:38:26,511 INFO: pg_controldata:
  pg_control version number: 1300
  Catalog version number: 202307071
  Database system identifier: 7429921490361479191
  Database cluster state: in production
  pg_control last modified: Sat Oct 26 03:33:16 2024
  Latest checkpoint location: 0/404D6D0
  Latest checkpoint's REDO location: 0/404D698
  Latest checkpoint's REDO WAL file: 000000020000000000000004
  Latest checkpoint's TimeLineID: 2
  Latest checkpoint's PrevTimeLineID: 2
  Latest checkpoint's full_page_writes: on
  Latest checkpoint's NextXID: 0:744
  Latest checkpoint's NextOID: 24576
  Latest checkpoint's NextMultiXactId: 1
  Latest checkpoint's NextMultiOffset: 0
  Latest checkpoint's oldestXID: 723
  Latest checkpoint's oldestXID's DB: 1
  Latest checkpoint's oldestActiveXID: 744
  Latest checkpoint's oldestMultiXid: 1
  Latest checkpoint's oldestMulti's DB: 1
  Latest checkpoint's oldestCommitTsXid: 0
  Latest checkpoint's newestCommitTsXid: 0
  Time of latest checkpoint: Sat Oct 26 03:33:16 2024
  Fake LSN counter for unlogged rels: 0/3E8
  Minimum recovery ending location: 0/0
  Min recovery ending loc's timeline: 0
  Backup start location: 0/0
  Backup end location: 0/0
  End-of-backup record required: no
  wal_level setting: replica
  wal_log_hints setting: on
  max_connections setting: 100
  max_worker_processes setting: 8
  max_wal_senders setting: 10
  max_prepared_xacts setting: 0
  max_locks_per_xact setting: 64
  track_commit_timestamp setting: off
  Maximum data alignment: 8
  Database block size: 8192
  Blocks per segment of large relation: 131072
  WAL block size: 8192
  Bytes per WAL segment: 16777216
  Maximum length of identifiers: 64
  Maximum columns in an index: 32
  Maximum size of a TOAST chunk: 1996
  Size of a large-object chunk: 2048
  Date/time type storage: 64-bit integers
  Float8 argument passing: by value
  Data page checksum version: 1
  Mock authentication nonce: e71f8193c5240fa7ec520c71dd6b0948d7af291a6e839c91faea92c067b7ef1d

2024-10-26 03:38:26,522 INFO: Reaped pid=17, exit status=0
2024-10-26 03:38:26,603 INFO: doing crash recovery in a single user mode
2024-10-26 03:38:27,066 WARNING: Postgresql is not running.
2024-10-26 03:38:27,066 INFO: Lock owner: patronidemo-0; I am patronidemo-1
2024-10-26 03:38:27,067 INFO: Reaped pid=23, exit status=0
2024-10-26 03:38:27,068 INFO: pg_controldata:
  pg_control version number: 1300
  Catalog version number: 202307071
  Database system identifier: 7429921490361479191
  Database cluster state: shut down
  pg_control last modified: Sat Oct 26 03:38:27 2024
  Latest checkpoint location: 0/5000028
  Latest checkpoint's REDO location: 0/5000028
  Latest checkpoint's REDO WAL file: 000000020000000000000005
  Latest checkpoint's TimeLineID: 2
  Latest checkpoint's PrevTimeLineID: 2
  Latest checkpoint's full_page_writes: on
  Latest checkpoint's NextXID: 0:744
  Latest checkpoint's NextOID: 24576
  Latest checkpoint's NextMultiXactId: 1
  Latest checkpoint's NextMultiOffset: 0
  Latest checkpoint's oldestXID: 723
  Latest checkpoint's oldestXID's DB: 1
  Latest checkpoint's oldestActiveXID: 0
  Latest checkpoint's oldestMultiXid: 1
  Latest checkpoint's oldestMulti's DB: 1
  Latest checkpoint's oldestCommitTsXid: 0
  Latest checkpoint's newestCommitTsXid: 0
  Time of latest checkpoint: Sat Oct 26 03:38:26 2024
  Fake LSN counter for unlogged rels: 0/3E8
  Minimum recovery ending location: 0/0
  Min recovery ending loc's timeline: 0
  Backup start location: 0/0
  Backup end location: 0/0
  End-of-backup record required: no
  wal_level setting: replica
  wal_log_hints setting: on
  max_connections setting: 100
  max_worker_processes setting: 8
  max_wal_senders setting: 10
  max_prepared_xacts setting: 0
  max_locks_per_xact setting: 64
  track_commit_timestamp setting: off
  Maximum data alignment: 8
  Database block size: 8192
  Blocks per segment of large relation: 131072
  WAL block size: 8192
  Bytes per WAL segment: 16777216
  Maximum length of identifiers: 64
  Maximum columns in an index: 32
  Maximum size of a TOAST chunk: 1996
  Size of a large-object chunk: 2048
  Date/time type storage: 64-bit integers
  Float8 argument passing: by value
  Data page checksum version: 1
  Mock authentication nonce: e71f8193c5240fa7ec520c71dd6b0948d7af291a6e839c91faea92c067b7ef1d

2024-10-26 03:38:27,068 INFO: Lock owner: patronidemo-0; I am patronidemo-1
2024-10-26 03:38:27,075 INFO: Reaped pid=25, exit status=0
2024-10-26 03:38:27,077 INFO: Local timeline=2 lsn=0/5000028
2024-10-26 03:38:27,090 INFO: primary_timeline=3
2024-10-26 03:38:27,091 INFO: primary: history=1    0/404D580   no recovery target specified
2   0/404D780   no recovery target specified
2024-10-26 03:38:27,098 INFO: Reaped pid=28, exit status=0
2024-10-26 03:38:27,100 INFO: Reaped pid=29, exit status=0
2024-10-26 03:38:27,100 INFO: running pg_rewind from patronidemo-0
2024-10-26 03:38:27,135 INFO: running pg_rewind from dbname=postgres user=postgres host=10.42.0.14 port=5432 target_session_attrs=read-write
2024-10-26 03:38:28,192 INFO: Lock owner: patronidemo-0; I am patronidemo-1
2024-10-26 03:38:28,193 INFO: running pg_rewind from patronidemo-0 in progress
2024-10-26 03:38:29,622 INFO: pg_rewind exit code=0
2024-10-26 03:38:29,622 INFO:  stdout=
2024-10-26 03:38:29,622 INFO:  stderr=pg_rewind: servers diverged at WAL location 0/404D780 on timeline 2
pg_rewind: rewinding from last common checkpoint at 0/404D6D0 on timeline 2
pg_rewind: Done!

2024-10-26 03:38:29,623 WARNING: Postgresql is not running.
2024-10-26 03:38:29,623 INFO: Lock owner: patronidemo-0; I am patronidemo-1
2024-10-26 03:38:29,625 INFO: Reaped pid=35, exit status=0
2024-10-26 03:38:29,625 INFO: pg_controldata:
  pg_control version number: 1300
  Catalog version number: 202307071
  Database system identifier: 7429921490361479191
  Database cluster state: in archive recovery
  pg_control last modified: Sat Oct 26 03:38:29 2024
  Latest checkpoint location: 0/404D8D0
  Latest checkpoint's REDO location: 0/404D898
  Latest checkpoint's REDO WAL file: 000000030000000000000004
  Latest checkpoint's TimeLineID: 3
  Latest checkpoint's PrevTimeLineID: 3
  Latest checkpoint's full_page_writes: on
  Latest checkpoint's NextXID: 0:744
  Latest checkpoint's NextOID: 24576
  Latest checkpoint's NextMultiXactId: 1
  Latest checkpoint's NextMultiOffset: 0
  Latest checkpoint's oldestXID: 723
  Latest checkpoint's oldestXID's DB: 1
  Latest checkpoint's oldestActiveXID: 744
  Latest checkpoint's oldestMultiXid: 1
  Latest checkpoint's oldestMulti's DB: 1
  Latest checkpoint's oldestCommitTsXid: 0
  Latest checkpoint's newestCommitTsXid: 0
  Time of latest checkpoint: Sat Oct 26 03:35:48 2024
  Fake LSN counter for unlogged rels: 0/3E8
  Minimum recovery ending location: 0/404D980
  Min recovery ending loc's timeline: 3
  Backup start location: 0/0
  Backup end location: 0/0
  End-of-backup record required: no
  wal_level setting: replica
  wal_log_hints setting: on
  max_connections setting: 100
  max_worker_processes setting: 8
  max_wal_senders setting: 10
  max_prepared_xacts setting: 0
  max_locks_per_xact setting: 64
  track_commit_timestamp setting: off
  Maximum data alignment: 8
  Database block size: 8192
  Blocks per segment of large relation: 131072
  WAL block size: 8192
  Bytes per WAL segment: 16777216
  Maximum length of identifiers: 64
  Maximum columns in an index: 32
  Maximum size of a TOAST chunk: 1996
  Size of a large-object chunk: 2048
  Date/time type storage: 64-bit integers
  Float8 argument passing: by value
  Data page checksum version: 1
  Mock authentication nonce: e71f8193c5240fa7ec520c71dd6b0948d7af291a6e839c91faea92c067b7ef1d

2024-10-26 03:38:29,625 INFO: Lock owner: patronidemo-0; I am patronidemo-1
2024-10-26 03:38:29,627 INFO: starting as a secondary
2024-10-26 03:38:30.193 UTC [41] LOG:  starting PostgreSQL 16.4 (Debian 16.4-1.pgdg120+2) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
2024-10-26 03:38:30.193 UTC [41] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2024-10-26 03:38:30.195 UTC [41] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2024-10-26 03:38:30.202 UTC [44] LOG:  database system was interrupted while in recovery at log time 2024-10-26 03:35:48 UTC
2024-10-26 03:38:30.202 UTC [44] HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
2024-10-26 03:38:30,210 INFO: postmaster pid=41
2024-10-26 03:38:30.263 UTC [46] FATAL:  the database system is starting up
localhost:5432 - rejecting connections
2024-10-26 03:38:30.312 UTC [44] LOG:  entering standby mode
2024-10-26 03:38:30.312 UTC [44] LOG:  starting backup recovery with redo LSN 0/404D698, checkpoint LSN 0/404D6D0, on timeline ID 2
2024-10-26 03:38:30.315 UTC [44] LOG:  redo starts at 0/404D698
2024-10-26 03:38:30.316 UTC [44] LOG:  completed backup recovery with redo LSN 0/404D698 and end LSN 0/404D980
2024-10-26 03:38:30.316 UTC [44] LOG:  consistent recovery state reached at 0/404D980
2024-10-26 03:38:30.316 UTC [44] LOG:  invalid record length at 0/404D980: expected at least 24, got 0
2024-10-26 03:38:30.316 UTC [41] LOG:  database system is ready to accept read-only connections
2024-10-26 03:38:30.326 UTC [48] LOG:  started streaming WAL from primary at 0/4000000 on timeline 3
localhost:5432 - accepting connections
2024-10-26 03:38:30,364 INFO: Lock owner: patronidemo-0; I am patronidemo-1
2024-10-26 03:38:30,365 INFO: establishing a new patroni heartbeat connection to postgres
2024-10-26 03:38:30,529 INFO: no action. I am (patronidemo-1), a secondary, and following a leader (patronidemo-0)
...

# node 2

2024-10-26 03:31:10,766 INFO: Selected new K8s API server endpoint https://192.168.56.3:6443
2024-10-26 03:31:17,048 INFO: No PostgreSQL configuration items changed, nothing to reload.
2024-10-26 03:31:17,050 INFO: Reaped pid=11, exit status=0
2024-10-26 03:31:17,055 INFO: Reaped pid=14, exit status=0
2024-10-26 03:31:17,056 WARNING: Postgresql is not running.
2024-10-26 03:31:17,056 INFO: Lock owner: patronidemo-0; I am patronidemo-2
2024-10-26 03:31:17,058 INFO: pg_controldata:
  pg_control version number: 1300
  Catalog version number: 202307071
  Database system identifier: 7429921490361479191
  Database cluster state: in archive recovery
  pg_control last modified: Sat Oct 26 03:30:18 2024
  Latest checkpoint location: 0/3000060
  Latest checkpoint's REDO location: 0/3000028
  Latest checkpoint's REDO WAL file: 000000010000000000000003
  Latest checkpoint's TimeLineID: 1
  Latest checkpoint's PrevTimeLineID: 1
  Latest checkpoint's full_page_writes: on
  Latest checkpoint's NextXID: 0:733
  Latest checkpoint's NextOID: 24576
  Latest checkpoint's NextMultiXactId: 1
  Latest checkpoint's NextMultiOffset: 0
  Latest checkpoint's oldestXID: 723
  Latest checkpoint's oldestXID's DB: 1
  Latest checkpoint's oldestActiveXID: 733
  Latest checkpoint's oldestMultiXid: 1
  Latest checkpoint's oldestMulti's DB: 1
  Latest checkpoint's oldestCommitTsXid: 0
  Latest checkpoint's newestCommitTsXid: 0
  Time of latest checkpoint: Sat Oct 26 03:30:16 2024
  Fake LSN counter for unlogged rels: 0/3E8
  Minimum recovery ending location: 0/3000100
  Min recovery ending loc's timeline: 1
  Backup start location: 0/0
  Backup end location: 0/0
  End-of-backup record required: no
  wal_level setting: replica
  wal_log_hints setting: on
  max_connections setting: 100
  max_worker_processes setting: 8
  max_wal_senders setting: 10
  max_prepared_xacts setting: 0
  max_locks_per_xact setting: 64
  track_commit_timestamp setting: off
  Maximum data alignment: 8
  Database block size: 8192
  Blocks per segment of large relation: 131072
  WAL block size: 8192
  Bytes per WAL segment: 16777216
  Maximum length of identifiers: 64
  Maximum columns in an index: 32
  Maximum size of a TOAST chunk: 1996
  Size of a large-object chunk: 2048
  Date/time type storage: 64-bit integers
  Float8 argument passing: by value
  Data page checksum version: 1
  Mock authentication nonce: e71f8193c5240fa7ec520c71dd6b0948d7af291a6e839c91faea92c067b7ef1d

2024-10-26 03:31:17,058 INFO: Lock owner: patronidemo-0; I am patronidemo-2
2024-10-26 03:31:17,065 INFO: Reaped pid=17, exit status=0
2024-10-26 03:31:17,067 INFO: Reaped pid=18, exit status=0
2024-10-26 03:31:17,067 INFO: Local timeline=1 lsn=0/3000100
2024-10-26 03:31:17,078 INFO: primary_timeline=1
2024-10-26 03:31:17,078 INFO: Lock owner: patronidemo-0; I am patronidemo-2
2024-10-26 03:31:17,156 INFO: starting as a secondary
2024-10-26 03:31:17,161 INFO: Lock owner: patronidemo-0; I am patronidemo-2
2024-10-26 03:31:17,184 INFO: restarting after failure in progress
2024-10-26 03:31:17.702 UTC [25] LOG:  starting PostgreSQL 16.4 (Debian 16.4-1.pgdg120+2) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
2024-10-26 03:31:17.702 UTC [25] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2024-10-26 03:31:17.707 UTC [25] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2024-10-26 03:31:17.714 UTC [28] LOG:  database system was interrupted while in recovery at log time 2024-10-26 03:30:16 UTC
2024-10-26 03:31:17.714 UTC [28] HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
2024-10-26 03:31:17,713 INFO: postmaster pid=25
2024-10-26 03:31:17.755 UTC [30] FATAL:  the database system is starting up
localhost:5432 - rejecting connections
2024-10-26 03:31:17.830 UTC [28] LOG:  entering standby mode
2024-10-26 03:31:17.838 UTC [28] LOG:  redo starts at 0/3000028
2024-10-26 03:31:17.838 UTC [28] LOG:  consistent recovery state reached at 0/3000100
2024-10-26 03:31:17.838 UTC [28] LOG:  invalid record length at 0/4000060: expected at least 24, got 0
2024-10-26 03:31:17.838 UTC [25] LOG:  database system is ready to accept read-only connections
localhost:5432 - accepting connections
2024-10-26 03:31:17,851 INFO: Lock owner: patronidemo-0; I am patronidemo-2
2024-10-26 03:31:17.852 UTC [32] LOG:  started streaming WAL from primary at 0/4000000 on timeline 1
2024-10-26 03:31:17,851 INFO: establishing a new patroni heartbeat connection to postgres
2024-10-26 03:31:17,885 INFO: no action. I am (patronidemo-2), a secondary, and following a leader (patronidemo-0)
2024-10-26 03:31:24,575 INFO: no action. I am (patronidemo-2), a secondary, and following a leader (patronidemo-0)
2024-10-26 03:31:34,658 INFO: no action. I am (patronidemo-2), a secondary, and following a leader (patronidemo-0)
2024-10-26 03:31:44,646 INFO: no action. I am (patronidemo-2), a secondary, and following a leader (patronidemo-0)
2024-10-26 03:31:54,620 INFO: no action. I am (patronidemo-2), a secondary, and following a leader (patronidemo-0)
2024-10-26 03:32:04,603 INFO: no action. I am (patronidemo-2), a secondary, and following a leader (patronidemo-0)
2024-10-26 03:32:14,621 INFO: no action. I am (patronidemo-2), a secondary, and following a leader (patronidemo-0)
2024-10-26 03:32:24,628 INFO: no action. I am (patronidemo-2), a secondary, and following a leader (patronidemo-0)
2024-10-26 03:32:34,643 INFO: no action. I am (patronidemo-2), a secondary, and following a leader (patronidemo-0)
2024-10-26 03:32:44,700 INFO: no action. I am (patronidemo-2), a secondary, and following a leader (patronidemo-0)
2024-10-26 03:32:48.940 UTC [32] FATAL:  could not receive data from WAL stream: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
2024-10-26 03:32:48.940 UTC [28] LOG:  invalid record length at 0/404D580: expected at least 24, got 0
2024-10-26 03:32:48.952 UTC [46] FATAL:  could not connect to the primary server: connection to server at "10.42.0.10", port 5432 failed: Connection refused
        Is the server running on that host and accepting TCP/IP connections?
2024-10-26 03:32:48.952 UTC [28] LOG:  waiting for WAL to become available at 0/404D598
2024-10-26 03:32:53.955 UTC [48] FATAL:  could not connect to the primary server: connection to server at "10.42.0.10", port 5432 failed: Connection refused
        Is the server running on that host and accepting TCP/IP connections?
2024-10-26 03:32:53.956 UTC [28] LOG:  waiting for WAL to become available at 0/404D598
2024-10-26 03:32:56,142 INFO: no action. I am (patronidemo-2), a secondary, and following a leader (patronidemo-0)
2024-10-26 03:32:58.952 UTC [50] FATAL:  could not connect to the primary server: connection to server at "10.42.0.10", port 5432 failed: Connection refused
        Is the server running on that host and accepting TCP/IP connections?
2024-10-26 03:32:58.953 UTC [28] LOG:  waiting for WAL to become available at 0/404D598
2024-10-26 03:33:03.945 UTC [52] FATAL:  could not connect to the primary server: connection to server at "10.42.0.10", port 5432 failed: Connection refused
        Is the server running on that host and accepting TCP/IP connections?
2024-10-26 03:33:03.945 UTC [28] LOG:  waiting for WAL to become available at 0/404D598
2024-10-26 03:33:05,089 INFO: no action. I am (patronidemo-2), a secondary, and following a leader (patronidemo-0)
2024-10-26 03:33:08.948 UTC [53] FATAL:  could not connect to the primary server: connection to server at "10.42.0.10", port 5432 failed: Connection refused
        Is the server running on that host and accepting TCP/IP connections?
2024-10-26 03:33:08.950 UTC [28] LOG:  waiting for WAL to become available at 0/404D598
2024-10-26 03:33:13.949 UTC [55] FATAL:  could not connect to the primary server: connection to server at "10.42.0.10", port 5432 failed: Connection refused
        Is the server running on that host and accepting TCP/IP connections?
2024-10-26 03:33:13.949 UTC [28] LOG:  waiting for WAL to become available at 0/404D598
2024-10-26 03:33:15,196 INFO: establishing a new patroni restapi connection to postgres
2024-10-26 03:33:15,223 WARNING: Request failed to patronidemo-0: GET http://10.42.0.10:8008/patroni (HTTPConnectionPool(host='10.42.0.10', port=8008): Max retries exceeded with url: /patroni (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fded313e910>: Failed to establish a new connection: [Errno 111] Connection refused')))
2024-10-26 03:33:15,282 INFO: Got response from patronidemo-1 http://10.42.1.3:8008/patroni: {"state": "running", "postmaster_start_time": "2024-10-26 03:30:10.609360+00:00", "role": "replica", "server_version": 160004, "xlog": {"received_location": 67425664, "replayed_location": 67425664, "replayed_timestamp": "2024-10-26 03:31:34.656102+00:00", "paused": false}, "timeline": 1, "cluster_unlocked": true, "dcs_last_seen": 1729913595, "database_system_identifier": "7429921490361479191", "patroni": {"version": "4.0.3", "scope": "patronidemo", "name": "patronidemo-1"}}
2024-10-26 03:33:15,314 INFO: Could not take out TTL lock
2024-10-26 03:33:15,314 ERROR: ObjectCache.run ProtocolError("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))
2024-10-26 03:33:15,320 ERROR: ObjectCache.run ProtocolError("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))
server signaled
2024-10-26 03:33:15.349 UTC [25] LOG:  received SIGHUP, reloading configuration files
2024-10-26 03:33:15.351 UTC [25] LOG:  parameter "primary_conninfo" changed to "dbname=postgres user=standby passfile=/tmp/pgpass host=10.42.1.3 port=5432 sslmode=prefer application_name=patronidemo-2 gssencmode=prefer channel_binding=prefer"
2024-10-26 03:33:15,375 INFO: Reaped pid=67, exit status=0
2024-10-26 03:33:15,376 INFO: following new leader after trying and failing to obtain lock
2024-10-26 03:33:15.402 UTC [66] LOG:  fetching timeline history file for timeline 2 from primary server
2024-10-26 03:33:15,406 INFO: Lock owner: patronidemo-1; I am patronidemo-2
2024-10-26 03:33:15,416 INFO: Local timeline=1 lsn=0/404D580
2024-10-26 03:33:15.417 UTC [66] LOG:  started streaming WAL from primary at 0/4000000 on timeline 1
2024-10-26 03:33:15.434 UTC [66] LOG:  replication terminated by primary server
2024-10-26 03:33:15.434 UTC [66] DETAIL:  End of WAL reached on timeline 1 at 0/404D580.
2024-10-26 03:33:15.442 UTC [66] FATAL:  terminating walreceiver process due to administrator command
2024-10-26 03:33:15.442 UTC [28] LOG:  new target timeline is 2
2024-10-26 03:33:15.469 UTC [69] LOG:  started streaming WAL from primary at 0/4000000 on timeline 2
2024-10-26 03:33:15,487 INFO: no action. I am (patronidemo-2), a secondary, and following a leader (patronidemo-1)
2024-10-26 03:33:16,336 INFO: Lock owner: patronidemo-1; I am patronidemo-2
2024-10-26 03:33:16,342 INFO: Local timeline=2 lsn=0/404D5E8
2024-10-26 03:33:16,372 INFO: no action. I am (patronidemo-2), a secondary, and following a leader (patronidemo-1)
2024-10-26 03:33:16,471 INFO: Lock owner: patronidemo-1; I am patronidemo-2
2024-10-26 03:33:16,476 INFO: Local timeline=2 lsn=0/404D748
2024-10-26 03:33:16,487 INFO: primary_timeline=2
2024-10-26 03:33:16,508 INFO: no action. I am (patronidemo-2), a secondary, and following a leader (patronidemo-1)
2024-10-26 03:33:26,458 INFO: no action. I am (patronidemo-2), a secondary, and following a leader (patronidemo-1)
2024-10-26 03:33:36,439 INFO: no action. I am (patronidemo-2), a secondary, and following a leader (patronidemo-1)
2024-10-26 03:33:46,484 INFO: no action. I am (patronidemo-2), a secondary, and following a leader (patronidemo-1)
2024-10-26 03:33:56,508 INFO: no action. I am (patronidemo-2), a secondary, and following a leader (patronidemo-1)
2024-10-26 03:34:06,496 INFO: no action. I am (patronidemo-2), a secondary, and following a leader (patronidemo-1)
2024-10-26 03:34:16,425 INFO: no action. I am (patronidemo-2), a secondary, and following a leader (patronidemo-1)
2024-10-26 03:34:26,428 INFO: no action. I am (patronidemo-2), a secondary, and following a leader (patronidemo-1)
2024-10-26 03:34:36,484 INFO: no action. I am (patronidemo-2), a secondary, and following a leader (patronidemo-1)
2024-10-26 03:34:46,423 INFO: no action. I am (patronidemo-2), a secondary, and following a leader (patronidemo-1)
2024-10-26 03:34:56,420 INFO: no action. I am (patronidemo-2), a secondary, and following a leader (patronidemo-1)
2024-10-26 03:35:06,505 INFO: no action. I am (patronidemo-2), a secondary, and following a leader (patronidemo-1)
2024-10-26 03:35:16,416 INFO: no action. I am (patronidemo-2), a secondary, and following a leader (patronidemo-1)
2024-10-26 03:35:17,943 ERROR: ObjectCache.run ProtocolError("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))
2024-10-26 03:35:17,944 ERROR: ObjectCache.run ProtocolError("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))
2024-10-26 03:35:18,108 ERROR: Request to server https://192.168.56.3:6443 failed: MaxRetryError("HTTPSConnectionPool(host='192.168.56.3', port=6443): Max retries exceeded with url: /api/v1/namespaces/default/pods?labelSelector=application%3Dpatroni%2Ccluster-name%3Dpatronidemo (Caused by ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')))")
2024-10-26 03:35:18,109 INFO: Retrying on https://192.168.56.2:6443
2024-10-26 03:35:18,112 ERROR: Request to server https://192.168.56.3:6443 failed: MaxRetryError("HTTPSConnectionPool(host='192.168.56.3', port=6443): Max retries exceeded with url: /api/v1/namespaces/default/endpoints?labelSelector=application%3Dpatroni%2Ccluster-name%3Dpatronidemo (Caused by ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')))")
2024-10-26 03:35:18,112 INFO: Retrying on https://192.168.56.2:6443
2024-10-26 03:35:18.220 UTC [69] FATAL:  could not receive data from WAL stream: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
2024-10-26 03:35:18.221 UTC [28] LOG:  invalid record length at 0/404D780: expected at least 24, got 0
2024-10-26 03:35:18.247 UTC [84] FATAL:  could not connect to the primary server: connection to server at "10.42.1.3", port 5432 failed: Connection refused
        Is the server running on that host and accepting TCP/IP connections?
2024-10-26 03:35:18.248 UTC [28] LOG:  waiting for WAL to become available at 0/404D798
2024-10-26 03:35:21,458 ERROR: Request to server https://192.168.56.2:6443 failed: MaxRetryError('HTTPSConnectionPool(host=\'192.168.56.2\', port=6443): Max retries exceeded with url: /api/v1/namespaces/default/pods?labelSelector=application%3Dpatroni%2Ccluster-name%3Dpatronidemo (Caused by ReadTimeoutError("HTTPSConnectionPool(host=\'192.168.56.2\', port=6443): Read timed out. (read timeout=3.295477614333398)"))')
2024-10-26 03:35:21,458 INFO: Retrying on https://192.168.56.4:6443
2024-10-26 03:35:21,463 ERROR: Request to server https://192.168.56.2:6443 failed: MaxRetryError('HTTPSConnectionPool(host=\'192.168.56.2\', port=6443): Max retries exceeded with url: /api/v1/namespaces/default/endpoints?labelSelector=application%3Dpatroni%2Ccluster-name%3Dpatronidemo (Caused by ReadTimeoutError("HTTPSConnectionPool(host=\'192.168.56.2\', port=6443): Read timed out. (read timeout=3.3032586613333783)"))')
2024-10-26 03:35:21,464 INFO: Retrying on https://192.168.56.4:6443
2024-10-26 03:35:23.248 UTC [86] FATAL:  could not connect to the primary server: connection to server at "10.42.1.3", port 5432 failed: Connection refused
        Is the server running on that host and accepting TCP/IP connections?
2024-10-26 03:35:23.248 UTC [28] LOG:  waiting for WAL to become available at 0/404D798
2024-10-26 03:35:24,799 ERROR: Request to server https://192.168.56.4:6443 failed: MaxRetryError('HTTPSConnectionPool(host=\'192.168.56.4\', port=6443): Max retries exceeded with url: /api/v1/namespaces/default/pods?labelSelector=application%3Dpatroni%2Ccluster-name%3Dpatronidemo (Caused by ReadTimeoutError("HTTPSConnectionPool(host=\'192.168.56.4\', port=6443): Read timed out. (read timeout=3.3167662853332636)"))')
2024-10-26 03:35:24,802 ERROR: Request to server https://192.168.56.4:6443 failed: MaxRetryError('HTTPSConnectionPool(host=\'192.168.56.4\', port=6443): Max retries exceeded with url: /api/v1/namespaces/default/endpoints?labelSelector=application%3Dpatroni%2Ccluster-name%3Dpatronidemo (Caused by ReadTimeoutError("HTTPSConnectionPool(host=\'192.168.56.4\', port=6443): Read timed out. (read timeout=3.3206299883332426)"))')
2024-10-26 03:35:26,196 ERROR: ObjectCache.run K8sConnectionFailed('No more API server nodes in the cluster')
2024-10-26 03:35:26,198 ERROR: ObjectCache.run K8sConnectionFailed('No more API server nodes in the cluster')
2024-10-26 03:35:26,240 ERROR: Request to server https://192.168.56.3:6443 failed: MaxRetryError("HTTPSConnectionPool(host='192.168.56.3', port=6443): Max retries exceeded with url: /api/v1/namespaces/default/pods?labelSelector=application%3Dpatroni%2Ccluster-name%3Dpatronidemo (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fded326e1d0>: Failed to establish a new connection: [Errno 111] Connection refused'))")
2024-10-26 03:35:26,240 INFO: Retrying on https://192.168.56.4:6443
2024-10-26 03:35:26,248 ERROR: Request to server https://192.168.56.3:6443 failed: MaxRetryError("HTTPSConnectionPool(host='192.168.56.3', port=6443): Max retries exceeded with url: /api/v1/namespaces/default/endpoints?labelSelector=application%3Dpatroni%2Ccluster-name%3Dpatronidemo (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fded326e350>: Failed to establish a new connection: [Errno 111] Connection refused'))")
2024-10-26 03:35:26,248 INFO: Retrying on https://192.168.56.4:6443
2024-10-26 03:35:26,289 INFO: Selected new K8s API server endpoint https://192.168.56.4:6443
2024-10-26 03:35:26,295 INFO: Selected new K8s API server endpoint https://192.168.56.4:6443
2024-10-26 03:35:26,936 INFO: no action. I am (patronidemo-2), a secondary, and following a leader (patronidemo-1)
2024-10-26 03:35:28.246 UTC [88] FATAL:  could not connect to the primary server: connection to server at "10.42.1.3", port 5432 failed: Connection refused
        Is the server running on that host and accepting TCP/IP connections?
2024-10-26 03:35:28.246 UTC [28] LOG:  waiting for WAL to become available at 0/404D798
2024-10-26 03:35:33.259 UTC [90] FATAL:  could not connect to the primary server: connection to server at "10.42.1.3", port 5432 failed: Connection refused
        Is the server running on that host and accepting TCP/IP connections?
2024-10-26 03:35:33.260 UTC [28] LOG:  waiting for WAL to become available at 0/404D798
2024-10-26 03:35:36,934 INFO: no action. I am (patronidemo-2), a secondary, and following a leader (patronidemo-1)
2024-10-26 03:35:38.259 UTC [91] FATAL:  could not connect to the primary server: connection to server at "10.42.1.3", port 5432 failed: Connection refused
        Is the server running on that host and accepting TCP/IP connections?
2024-10-26 03:35:38.259 UTC [28] LOG:  waiting for WAL to become available at 0/404D798
2024-10-26 03:35:43.256 UTC [93] FATAL:  could not connect to the primary server: connection to server at "10.42.1.3", port 5432 failed: Connection refused
        Is the server running on that host and accepting TCP/IP connections?
2024-10-26 03:35:43.256 UTC [28] LOG:  waiting for WAL to become available at 0/404D798
2024-10-26 03:35:46,919 WARNING: Request failed to patronidemo-1: GET http://10.42.1.3:8008/patroni (HTTPConnectionPool(host='10.42.1.3', port=8008): Max retries exceeded with url: /patroni (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fded3138f50>: Failed to establish a new connection: [Errno 111] Connection refused')))
2024-10-26 03:35:46,982 INFO: Got response from patronidemo-0 http://10.42.0.14:8008/patroni: {"state": "running", "postmaster_start_time": "2024-10-26 03:35:10.926488+00:00", "role": "replica", "server_version": 160004, "xlog": {"received_location": 67108864, "replayed_location": 67426176, "replayed_timestamp": "2024-10-26 03:31:34.656102+00:00", "paused": false}, "timeline": 2, "cluster_unlocked": true, "dcs_last_seen": 1729913746, "database_system_identifier": "7429921490361479191", "patroni": {"version": "4.0.3", "scope": "patronidemo", "name": "patronidemo-0"}}
2024-10-26 03:35:47,018 ERROR: ObjectCache.run ProtocolError("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))
2024-10-26 03:35:47,023 ERROR: ObjectCache.run ProtocolError("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))
2024-10-26 03:35:47,028 INFO: Could not take out TTL lock
2024-10-26 03:35:47.154 UTC [25] LOG:  received SIGHUP, reloading configuration files
2024-10-26 03:35:47.155 UTC [25] LOG:  parameter "primary_conninfo" changed to "dbname=postgres user=standby passfile=/tmp/pgpass host=10.42.0.14 port=5432 sslmode=prefer application_name=patronidemo-2 gssencmode=prefer channel_binding=prefer"
server signaled
2024-10-26 03:35:47,169 INFO: following new leader after trying and failing to obtain lock
2024-10-26 03:35:47,170 INFO: Lock owner: patronidemo-0; I am patronidemo-2
2024-10-26 03:35:47,176 INFO: Local timeline=2 lsn=0/404D780
2024-10-26 03:35:47.180 UTC [102] LOG:  fetching timeline history file for timeline 3 from primary server
2024-10-26 03:35:47,181 INFO: no action. I am (patronidemo-2), a secondary, and following a leader (patronidemo-0)
2024-10-26 03:35:47.188 UTC [102] LOG:  started streaming WAL from primary at 0/4000000 on timeline 2
2024-10-26 03:35:47.194 UTC [102] LOG:  replication terminated by primary server
2024-10-26 03:35:47.194 UTC [102] DETAIL:  End of WAL reached on timeline 2 at 0/404D780.
2024-10-26 03:35:47.197 UTC [102] FATAL:  terminating walreceiver process due to administrator command
2024-10-26 03:35:47.198 UTC [28] LOG:  new target timeline is 3
2024-10-26 03:35:47.228 UTC [106] LOG:  started streaming WAL from primary at 0/4000000 on timeline 3
2024-10-26 03:35:48,104 INFO: Lock owner: patronidemo-0; I am patronidemo-2
2024-10-26 03:35:48,109 INFO: Local timeline=3 lsn=0/404D7E8
2024-10-26 03:35:48,129 INFO: no action. I am (patronidemo-2), a secondary, and following a leader (patronidemo-0)
2024-10-26 03:35:48,172 INFO: Lock owner: patronidemo-0; I am patronidemo-2
2024-10-26 03:35:48,176 INFO: Local timeline=3 lsn=0/404D948
2024-10-26 03:35:48,190 INFO: primary_timeline=3
2024-10-26 03:35:48,221 INFO: no action. I am (patronidemo-2), a secondary, and following a leader (patronidemo-0)
2024-10-26 03:35:58,259 INFO: no action. I am (patronidemo-2), a secondary, and following a leader (patronidemo-0)
2024-10-26 03:36:08,295 INFO: no action. I am (patronidemo-2), a secondary, and following a leader (patronidemo-0)
2024-10-26 03:36:17.182 UTC [26] LOG:  restartpoint starting: time
2024-10-26 03:36:18,212 INFO: no action. I am (patronidemo-2), a secondary, and following a leader (patronidemo-0)
2024-10-26 03:36:21.423 UTC [26] LOG:  restartpoint complete: wrote 43 buffers (0.3%); 0 WAL file(s) added, 0 removed, 0 recycled; write=4.214 s, sync=0.003 s, total=4.241 s; sync files=10, longest=0.002 s, average=0.001 s; distance=16694 kB, estimate=16694 kB; lsn=0/404D8D0, redo lsn=0/404D898
2024-10-26 03:36:21.423 UTC [26] LOG:  recovery restart point at 0/404D898
2024-10-26 03:36:21.423 UTC [26] DETAIL:  Last completed transaction was at log time 2024-10-26 03:31:34.656102+00.
2024-10-26 03:36:28,195 INFO: no action. I am (patronidemo-2), a secondary, and following a leader (patronidemo-0)
2024-10-26 03:36:38,199 INFO: no action. I am (patronidemo-2), a secondary, and following a leader (patronidemo-0)
...

PostgreSQL log files

n/a

Have you tried to use GitHub issue search?

[X] Yes

Anything else we need to know?

Time in Patroni is UTC. I can upload more logs if you tell me how to persist logs to disk instead of stdout.

The time when first dual primary happened in the test, time is ET UTC-4:

2024-10-25 23:33:15,947{GMT}    INFO    [jepsen worker 0] jepsen.util: 0        :fail   :read-primary   ("192.168.56.2" "192.168.56.3")

The time when last dual primary happened:

2024-10-25 23:38:25,802{GMT}    INFO    [jepsen worker 0] jepsen.util: 0        :fail   :read-primary   ("192.168.56.2" "192.168.56.3")

Here is the read-primary results along with the time when node kill and network partitions.

rate

Please let me know if you need more logs or need assist to run the Jepsen test.

wb14123 commented 5 days ago

Hmm actually I observed pod is scheduled to different nodes from time to time. Maybe that's the reason. I'll try to fix it and report back.

wb14123 commented 5 days ago

Fixed the issue. Now I don't see pod drift to different nodes anymore. But still see multiple primaries. I can see there are even 3 primaries in a new test.

However it's hard to debug what happened since the pod's stdout is rotated after restart. If anyone can point direction to output the logs into a file, I can re-run and upload the whole log.

wb14123 commented 5 days ago

Reading the doc again seems I need to turn on synchronous_mode? Let me test with it turned on

wb14123 commented 5 days ago

I updated the config here and retested.

Confirmed the config is effective:

postgres@patronidemo-0:~$ patronictl show-config
postgresql:
  pg_hba:
  - host all all 0.0.0.0/0 md5
  - host replication standby 10.42.0.26/16 md5
  - host replication standby 127.0.0.1/32 md5
  use_pg_rewind: true
synchronous_mode: 'on'
synchronous_mode_strict: 'on'

postgresql.conf:

postgres@patronidemo-0:~$ cat /home/postgres/pgdata/pgroot/data/postgresql.conf
# Do not edit this file manually!
# It will be overwritten by Patroni!
include 'postgresql.base.conf'

cluster_name = 'patronidemo'
hot_standby = 'on'
listen_addresses = '0.0.0.0'
max_connections = '100'
max_locks_per_transaction = '64'
max_prepared_transactions = '0'
max_replication_slots = '10'
max_wal_senders = '10'
max_worker_processes = '8'
port = '5432'
synchronous_commit = 'on'
synchronous_standby_names = '"patronidemo-1"'
track_commit_timestamp = 'off'
wal_keep_size = '128MB'
wal_level = 'replica'
wal_log_hints = 'on'
hba_file = '/home/postgres/pgdata/pgroot/data/pg_hba.conf'
ident_file = '/home/postgres/pgdata/pgroot/data/pg_ident.conf'

# recovery.conf
recovery_target = ''
recovery_target_lsn = ''
recovery_target_name = ''
recovery_target_time = ''
recovery_target_timeline = 'latest'
recovery_target_xid = ''

It still failed. But it seems to happen less tho. rate

CyberDem0n commented 5 days ago

The problem is that partitioned pod can't update its own labels, because Etcd in the losing part is read-only. Therefore you shouldn't be looking at labels, but rather a "role" returned by Patroni REST API or value returned by SELECT pg_is_in_recovery() on Postgres.

wb14123 commented 5 days ago

Got it, thanks! It's hard to query all nodes to confirm there is no multiple primaries at the same time since there is gap between queries. Is there any other info in etcd that Patroni uses to identify which node should be primary? I'm not familiar with Patroni but I guess there should be some distributed "source of truth" about the primary info, otherwise if N1, N2 is partitioned from N3, then N3 identified itself as replica since it's in minority, but N1 and N2 need to agree on who is primary.

Or better, since theoretically Patroni can lose data as stated in the doc, maybe it's not very valuable to confirm if transactions work correctly or not in Patroni in node/network failures. Is there any other test you think would be valuable to run? So we don't need to care about the low level details about who is the primary and can just confirm if Patroni can keep its guarantee or not.

CyberDem0n commented 5 days ago

To find the leader you need to check the leader endpoint (patronidemo). The current leader is stored in annotations.

Also, it would be nice to check synchronous_mode: quorum.

In theory, there should be no client visible data loss. However even in case of synchronous replication the transaction could become visible earlier then it was acknowledged by synchronous replicas

CyberDem0n commented 5 days ago

@wb14123 thanks for working on it! I always wanted to do it but unfortunately I don't speak clojure 😅

wb14123 commented 5 days ago

I fixed the command to get primary from endpoint. It passed the test. So I think this issue can be closed.

I'll do some tests for transactions and see how it goes.

wb14123 commented 3 days ago

@CyberDem0n I tried to run some tests and it actually passed serializable tests. But that may because my test is not thoroughly enough to trigger the failures. I want to understand more about this before I go further:

In theory, there should be no client visible data loss. However even in case of synchronous replication the transaction could become visible earlier then it was acknowledged by synchronous replicas.

I assume it's the same thing as you talked in this thread? This means it can have data loss if the the standby is promoted to primary before the transaction is replicated, right?

CyberDem0n commented 3 days ago

I fixed the command to get primary from endpoint.

A bit more details. The endpoint in this case will contain a pod name that is supposed to be running as a primary. That is, SELECT pg_is_in_recovery() only on this pod is supposed to return false, however, it is also possible that it will also return true, because it can't access K8s API or because it wasn't yet promoted. On other pods SELECT pg_is_in_recovery() is supposed to return only true.

Of course, it could also be that postgres doesn't accepting connections and the query couldn't be executed.

I assume it's the same thing as you talked in this thread? This means it can have data loss if the the standby is promoted to primary before the transaction is replicated, right?

Yes, sort of. In PG transactions are always first committed to WAL, but the lock is held until required number of sync replicas acknowledge it as received/flushed/applied (depending on the configuration). However, it is possible that the lock will be release earlier then necessary:

When the client executing a transaction sends a cancellation request.
If Postgres crashed and restarted.

In both cases the transaction will become visible globally even if it wasn't replicated to sufficient numbers of sync nodes.

Besides that, transaction becomes immediately visible on every standby node where it was already applied, even if it is still locked on the primary. This statement holds for both, synchronous and asynchronous nodes.

Consider a situation:

Client commits a transaction and sends a cancellation request.
Transaction becomes visible to all clients connected to primary, but still not replicated to sync nodes.
Primary crashes
Sync standby node is promoted.

This will be a client visible data loss, which Patroni can't prevent. Instead of cancellation request there could be the following:

postgres crash -> postgres start. This one could be prevented by setting Patroni parameter primary_stop_timeout to 0
postgres crash -> automatic crash recovery. This one could be prevented by setting GUC restart_after_crash to off and Patroni parameter primary_stop_timeout to 0

wb14123 commented 3 days ago

Yeah I'm using the following command to find the primary node:

kubectl get endpoints patronidemo -o jsonpath="{.subsets[*].addresses[*].nodeName}"

Client commits a transaction and sends a cancellation request.

By cancellation, does force kill the client thread count?

wb14123 commented 2 days ago

Another question about primary label: how to define the k8s service to access the primary if the label is not reliable? I current use the config like this to define a service that can be accessed:

apiVersion: v1
kind: Service
metadata:
  name: patronidemo-public
  labels:
    application: patroni
    cluster-name: &cluster_name patronidemo
spec:
  type: NodePort
  ports:
  - port: 5432
    targetPort: 5432
    nodePort: 30020
  selector:
    application: patroni
    cluster-name: *cluster_name
    role: primary

It uses role: primary selector to make sure route the traffic the the primary node.

CyberDem0n commented 2 days ago

It uses role: primary selector to make sure route the traffic the the primary node.

Since you are using endpoints, there is need to use label selector, Patroni primary is taking care of managing of leader endpoint subsets, where it puts it's own Pod IP. That is, leader service will be connecting to the correct pod. However, there are details, because every K8s worker node is handling services-endpoins-subsets independently and asynchronously.

If someone is running Patroni with config maps, they need to rely on Service with label selector and in addition to that readiness probes should be configured on pods.

By cancellation, does force kill the client thread count?

No, I mean a special cancellation request: https://www.postgresql.org/docs/current/libpq-cancel.html

wb14123 commented 2 days ago

Does this mean Patroni can theoretically has no data loss since cancellation is something client can control (unlike client being killed, which client cannot control)?

CyberDem0n commented 2 days ago

Patroni can't detect situations when clients cancelling "queries" that are waiting for sync replication.

wb14123 commented 2 days ago

Yeah understood. But as someone using Patroni, as long as I don't send cancellation command, which I can control in my program, Patroni can guarantee no data loss. Is that right?

CyberDem0n commented 2 days ago

Is that right?

Isn't the Jepsen test supposed to verify it?

wb14123 commented 2 days ago

Yeah that's kind of my goal... But I cannot produce any scenario that Patroni lose data. That can mean Patroni is solid, but can also mean my test is not thoroughly enough. If there is any known issue that can cause data lose and my test doesn't reproduce it, that means my test is missing something. So just want to confirm if there is any known issue or not.

CyberDem0n commented 2 days ago

@wb14123 is it based on https://github.com/jepsen-io/jepsen?

wb14123 commented 2 days ago

Yes. The test workloads are mostly copied from https://github.com/jepsen-io/jepsen/tree/main/stolon, which caught a Postgres bug in version 12. In my test, I imported different failures like nodes being killed, network slowdown and network partition (I haven't committed some failures to the repo yet). But there is no guarantee it covers all possible failures. So I'm thinking if I need to think harder and run more failure scenarios or just stop here.

wb14123 commented 2 days ago

I'm sure the author of Jepsen @aphyr can find more things than I can. I communicated with him through Email. He is busy with other things so encourage me to do my own tests.

ants commented 2 days ago

Yeah understood. But as someone using Patroni, as long as I don't send cancellation command, which I can control in my program, Patroni can guarantee no data loss. Is that right?

The connection might be cancelled due to any number of things, not only client request. TCP connections get reset, crash-restart of the server. Each one of them will make transactions that are not yet replicated visible to readers.

This is not a theoretical narrow edge case, it can be quite likely as connectivity issues can easily be triggered by same root cause as a failover. And a correctly written client, that for example is replaying transactions from a queue, should reconnect on interrupted commit and check if it succeeded or not. If that check gives a false positive the entry will be discarded from the queue before it is actually durable.

"Disallow cancellation of waiting for synchronous replication" thread on pgsql-hackers intended to fix at least a portion of this, but got stalled. I'm not sure the approach in that thread is even enough in the generic case where with quorum commit transactions might become visible on a standby before they are durable and therefore lost on failover.

wb14123 commented 2 days ago

@ants So my question above

By cancellation, does force kill the client thread count?

The answer should be yes? Meaning it can also trigger the scenario that visible transactions not being replicated?

ants commented 2 days ago

@ants So my question above

By cancellation, does force kill the client thread count?

The answer should be yes? Meaning it can also trigger the scenario that visible transactions not being replicated?

If by client thread you mean the process handling the waiting connection and by force kill you mean SIGTERM, then yes.

patroni / patroni