[question] pg_wal eat disk because inactive replication slot

zalando / postgres-operator

Postgres operator creates and manages PostgreSQL clusters running in Kubernetes

MIT License

4.25k stars 969 forks source link

Please, answer some short questions which should help us to understand your problem / question better?

Which image of the operator are you using? registry.opensource.zalan.do/acid/postgres-operator:v1.5.0
Where do you run it - cloud or metal? Kubernetes or OpenShift? Bare Metal K8s
Are you running Postgres Operator in production? no
Type of issue? question

The first time I found my postgresql unavailable for 100% used pvc in a pod, I just expand the pvc. however, it failed again a few days later.

archive_mod is set to be off in my postgresql.
three instances postgresql cluster

Finally I found out this may be caused by inactive replication slot. Using select * from pg_replication_slots in the master pod, I saw two inactive replication slot. I fixed it by recreating two replicas' pods and pvcs manually (kubectl delete pod and pvc) and it went back to normal then. The master cleaned wal after replication slots all being active.

I got a few questions about this problem:

what could cause inactive replication slots ?
how to avoid it? Is operator responsible to reconcile it when replication slot being inactive?
Can I fix it just by using the latest version?

Thanks!

related issue:

root@my-postgres-0:/home/postgres# patronictl list + Cluster: my-postgres (7166512203438432325) -----+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +---------------+-------------+---------+---------+----+-----------+ | my-postgres-0 | 10.0.102.49 | Leader | running | 3 | | | my-postgres-1 | 10.0.103.13 | Replica | running | 2 | 515 | | my-postgres-2 | 10.0.106.68 | Replica | running | 2 | 515 | +---------------+-------------+---------+---------+----+-----------+

zalando / postgres-operator

[question] pg_wal eat disk because inactive replication slot #2012