Open tydra-wang opened 2 years ago
Stumbled upon the same problem.
Used patronictl list
to find out that two replicas were lagging:
root@my-postgres-0:/home/postgres# patronictl list
+ Cluster: my-postgres (7166512203438432325) -----+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+---------------+-------------+---------+---------+----+-----------+
| my-postgres-0 | 10.0.102.49 | Leader | running | 3 | |
| my-postgres-1 | 10.0.103.13 | Replica | running | 2 | 515 |
| my-postgres-2 | 10.0.106.68 | Replica | running | 2 | 515 |
+---------------+-------------+---------+---------+----+-----------+
Then reinitiated these replicas using patronictl reinit
(note that this will delete all data on the replica and pull it from the leader):
root@my-postgres-0:/home/postgres# patronictl reinit my-postgres my-postgres-1
root@my-postgres-0:/home/postgres# patronictl reinit my-postgres my-postgres-2
After a while, the WAL size decreased.
Please, answer some short questions which should help us to understand your problem / question better?
The first time I found my postgresql unavailable for 100% used pvc in a pod, I just expand the pvc. however, it failed again a few days later.
archive_mod
is set to beoff
in my postgresql.Finally I found out this may be caused by inactive replication slot. Using
select * from pg_replication_slots
in the master pod, I saw two inactive replication slot. I fixed it by recreating two replicas' pods and pvcs manually (kubectl delete pod and pvc) and it went back to normal then. The master cleaned wal after replication slots all being active.I got a few questions about this problem:
Thanks!
related issue: