zalando / postgres-operator

Postgres operator creates and manages PostgreSQL clusters running in Kubernetes
https://postgres-operator.readthedocs.io/
MIT License
4.33k stars 979 forks source link

replica failed to start postgres #2257

Open IIvyPy opened 1 year ago

IIvyPy commented 1 year ago

Please, answer some short questions which should help us to understand your problem / question better?

Some general remarks when posting a bug report: when I try to restart my slave node, it continues say that " failed to start postgres". except that,there is nothing else useful information.

my log is below:

2023-03-09 06:23:10,012 INFO: failed to start postgres 2023-03-09 06:23:20,014 WARNING: Postgresql is not running. 2023-03-09 06:23:20,015 INFO: Lock owner: dns-cluster-postgres-0; I am dns-cluster-postgres-1 2023-03-09 06:23:20,021 INFO: pg_controldata: pg_control version number: 1002 Catalog version number: 201707211 Database system identifier: 7177581149058183249 Database cluster state: shut down in recovery pg_control last modified: Wed Mar 8 16:57:33 2023 Latest checkpoint location: D3/6B91A7C0 Prior checkpoint location: D3/6B91A7C0 Latest checkpoint's REDO location: D3/6B255640 Latest checkpoint's REDO WAL file: 0000001A000000D30000006B Latest checkpoint's TimeLineID: 26 Latest checkpoint's PrevTimeLineID: 26 Latest checkpoint's full_page_writes: on Latest checkpoint's NextXID: 0:10953564 Latest checkpoint's NextOID: 3363681 Latest checkpoint's NextMultiXactId: 1 Latest checkpoint's NextMultiOffset: 0 Latest checkpoint's oldestXID: 549 Latest checkpoint's oldestXID's DB: 1 Latest checkpoint's oldestActiveXID: 10953564 Latest checkpoint's oldestMultiXid: 1 Latest checkpoint's oldestMulti's DB: 1 Latest checkpoint's oldestCommitTsXid: 0 Latest checkpoint's newestCommitTsXid: 0 Time of latest checkpoint: Wed Mar 8 16:40:04 2023 Fake LSN counter for unlogged rels: 0/1 Minimum recovery ending location: D3/6D473F40 Min recovery ending loc's timeline: 27 Backup start location: 0/0 Backup end location: 0/0 End-of-backup record required: no wal_level setting: logical wal_log_hints setting: on max_connections setting: 300 max_worker_processes setting: 1024 max_prepared_xacts setting: 0 max_locks_per_xact setting: 64 track_commit_timestamp setting: off Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 8192 Bytes per WAL segment: 16777216 Maximum length of identifiers: 64 Maximum columns in an index: 32 Maximum size of a TOAST chunk: 1996 Size of a large-object chunk: 2048 Date/time type storage: 64-bit integers Float4 argument passing: by value Float8 argument passing: by value Data page checksum version: 1 Mock authentication nonce: e6e97c5b716602c3ae7d0537226f1e0c719614605897cbb4580e7d0caedda556 2023-03-09 06:23:20,022 INFO: Lock owner: dns-cluster-postgres-0; I am dns-cluster-postgres-1 2023-03-09 06:23:20,023 INFO: starting as a secondary 2023-03-09 06:23:20,638 INFO: postmaster pid=9388 /var/run/postgresql:5432 - no response 2023-03-09 06:23:20 UTC [9388]: [1-1] 64097b58.24ac 0 LOG: Auto detecting pg_stat_kcache.linux_hz parameter... 2023-03-09 06:23:20 UTC [9388]: [2-1] 64097b58.24ac 0 LOG: pg_stat_kcache.linux_hz is set to 1000000 2023-03-09 06:23:20 UTC [9388]: [3-1] 64097b58.24ac 0 LOG: listening on IPv4 address "0.0.0.0", port 5432 2023-03-09 06:23:20 UTC [9388]: [4-1] 64097b58.24ac 0 LOG: listening on IPv6 address "::", port 5432 2023-03-09 06:23:20 UTC [9388]: [5-1] 64097b58.24ac 0 LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432" 2023-03-09 06:23:21 UTC [9388]: [6-1] 64097b58.24ac 0 LOG: redirecting log output to logging collector process 2023-03-09 06:23:21 UTC [9388]: [7-1] 64097b58.24ac 0 HINT: Future log output will appear in directory "../pg_log". /var/run/postgresql:5432 - rejecting connections /var/run/postgresql:5432 - rejecting connections /var/run/postgresql:5432 - rejecting connections /var/run/postgresql:5432 - no response 2023-03-09 06:23:30,000 INFO: Lock owner: dns-cluster-postgres-0; I am dns-cluster-postgres-1 2023-03-09 06:23:30,000 INFO: failed to start postgres

jonathon2nd commented 1 year ago

I am seeing this as well, after updating operator.

noahge commented 1 year ago

same problem, I see that the master node restores a different archive than the child nodes

FxKu commented 1 year ago

Which spilo version are you using? Can you check if the are Postgres logs available to check? But sounds like it fails earlier and Postgres isn't even started. What's the timeline that lead to this situation?