zalando / postgres-operator

Postgres operator creates and manages PostgreSQL clusters running in Kubernetes
https://postgres-operator.readthedocs.io/
MIT License
4.14k stars 950 forks source link

No master selected after pvcs are full #1664

Open nfoerster opened 2 years ago

nfoerster commented 2 years ago

After pvcs went full because the pg_wal data need almost 99% of disk space. Although I don't know why this happened, the nodes do not select a master anymore. Rebooting Operator and pods didn't help. The cluster is not available for connections anymore. The wal backups so far never worked so we dealed with the logical backup. There is much room for improvements in the documentation. As a workaround, I deleted the pg_wal folder in the pods.

As a side node, I tried to scale the cluster to 3 pods, but this does not help to find a master node.

postgres-prod-postgres-cluster-1-1635482831171809989.log postgres-prod-postgres-cluster-0-1635482814583922985.log postgres-operator-postgres-operator-6c7dd75d86-nngxd-1635483253533878679.log

FxKu commented 2 years ago

If you don't want to use WAL archiving, turn it off, e.g. under spec.postgresql.parameters with archive_mode: "off". I think, it would also be off if your WAL bucket path is not configured but from the logs I can see it wants to use wal-postgres-backup (is it loaded via pod_environment_configmap?).

If volumes are full, either increase the size to get your cluster working again or exec into the pods to remove the pile of wal files (which are already streamed to replicas).

I was hoping that I made the documentation a bit clearer with the latest releases. You might want to check again and tell which part you don't get.

linus-ha commented 7 months ago

I did encounter the same issue on my two db clusters. This happened to both of them at roughly the same time, which I find interesting and can not understand yet. Both of them were running for multiple months already.

I do understand the solution here, but as to be read in the initial issue, after the cluster has reached that state, no leader is being selected anymore. This is after cleaning the pg_wal with pg_archivecleanup .

+ Cluster: xxxxxxxxx-postgresql-prod -----+---------+--------------+----+-----------+
| Member                      | Host      | Role    | State        | TL | Lag in MB |
+-----------------------------+-----------+---------+--------------+----+-----------+
| xxxxxxxxx-postgresql-prod-0 | 10.2.4.25 | Replica | start failed |    |   unknown |
| xxxxxxxxx-postgresql-prod-1 | 10.2.5.24 | Replica | starting     |    |   unknown |
+-----------------------------+-----------+---------+--------------+----+-----------+

Both instances are constantly restarting.

The logs show repeating:

2023-11-29 15:14:58,377 INFO: pg_controldata:
  pg_control version number: 1300
  Catalog version number: 202209061
  Database system identifier: 7279471581888901195
  Database cluster state: in archive recovery
  pg_control last modified: Sun Nov 26 16:18:15 2023
  Latest checkpoint location: D/F8000A00
  Latest checkpoint's REDO location: D/F80009C8
  Latest checkpoint's REDO WAL file: 000000440000000D000000F8
  Latest checkpoint's TimeLineID: 68
  Latest checkpoint's PrevTimeLineID: 68
  Latest checkpoint's full_page_writes: on
  Latest checkpoint's NextXID: 0:5832058
  Latest checkpoint's NextOID: 135443
  Latest checkpoint's NextMultiXactId: 2522
  Latest checkpoint's NextMultiOffset: 6762
  Latest checkpoint's oldestXID: 717
  Latest checkpoint's oldestXID's DB: 1
  Latest checkpoint's oldestActiveXID: 5832058
  Latest checkpoint's oldestMultiXid: 1
  Latest checkpoint's oldestMulti's DB: 1
  Latest checkpoint's oldestCommitTsXid: 0
  Latest checkpoint's newestCommitTsXid: 0
  Time of latest checkpoint: Sun Nov 26 15:50:26 2023
  Fake LSN counter for unlogged rels: 0/3E8
  Minimum recovery ending location: D/F9000000
  Min recovery ending loc's timeline: 68
  Backup start location: 0/0
  Backup end location: 0/0
  End-of-backup record required: no
  wal_level setting: replica
  wal_log_hints setting: on
  max_connections setting: 100
  max_worker_processes setting: 8
  max_wal_senders setting: 10
  max_prepared_xacts setting: 0
  max_locks_per_xact setting: 64
  track_commit_timestamp setting: off
  Maximum data alignment: 8
  Database block size: 8192
  Blocks per segment of large relation: 131072
  WAL block size: 8192
  Bytes per WAL segment: 16777216
  Maximum length of identifiers: 64
  Maximum columns in an index: 32
  Maximum size of a TOAST chunk: 1996
  Size of a large-object chunk: 2048
  Date/time type storage: 64-bit integers
  Float8 argument passing: by value
  Data page checksum version: 0
  Mock authentication nonce: 5b7db095e82feda99eb6ba2556455d110aa09480eb12c9e9e7aaee51cbbd75ee

2023-11-29 15:14:58,385 INFO: Lock owner: None; I am xxxxxxxxx-postgresql-prod-0
2023-11-29 15:14:58,385 INFO: starting as a secondary
2023-11-29 15:14:58 UTC [1357894]: [1-1] 65675572.14b846 0     LOG:  Auto detecting pg_stat_kcache.linux_hz parameter...
2023-11-29 15:14:58 UTC [1357894]: [2-1] 65675572.14b846 0     LOG:  pg_stat_kcache.linux_hz is set to 500000
2023-11-29 15:14:58,609 INFO: postmaster pid=1357894
/var/run/postgresql:5432 - no response
2023-11-29 15:14:58 UTC [1357894]: [3-1] 65675572.14b846 0     LOG:  redirecting log output to logging collector process
2023-11-29 15:14:58 UTC [1357894]: [4-1] 65675572.14b846 0     HINT:  Future log output will appear in directory "../pg_log".
/var/run/postgresql:5432 - rejecting connections
/var/run/postgresql:5432 - rejecting connections
/var/run/postgresql:5432 - rejecting connections
/var/run/postgresql:5432 - rejecting connections
/var/run/postgresql:5432 - rejecting connections
/var/run/postgresql:5432 - no response
2023-11-29 15:15:08,375 WARNING: Postgresql is not running.
2023-11-29 15:15:08,375 INFO: Lock owner: None; I am xxxxxxxxx-postgresql-prod-0

The last log file in pg_log also ends on 2023/11/26

tmechsner commented 6 months ago

Any news on this?

Hades32 commented 4 months ago

@FxKu we hit the same issue today due to a misconfiguration on our end. What I don't yet understand is though: Why didn't the follower become the primary? The disk of the primary (in our case at least) was mostly empty, so it could have continued for a while.

Does Patroni need to write to disk even to yield the primary? Wouldn't it be better then if Patroni used another volume for that?