Open m0sh1x2 opened 2 years ago
We are also facing similar issues. When I dig into the pods, found that /home/postgres/pgdata/pgroot/data/pg_wal
directory has lot for wal log files.
The wal log files were not cleaned, hence the logs file fills the pod disk space and postgresql will not respond anymore. any way to cleanup the files as it is archiving over remote S3 ?
we are using kubernates cluster on top of private cloud,we are also facing similar issues. When I dig into the pods, found that /home/postgres/pgdata/pgroot/data/pg_wal directory has lot for wal log files with GB's and manytime pod gets crash due to no space left on disk.
The wal log files were not cleaned, hence the logs file fills the pod disk space and postgresql will not respond anymore. any way to cleanup the files as it is archiving over remote S3 ?
+1
There are more or less three reasons why pg_wal is growing:
archive_command
You need to investigate, find the reason, and eliminate the problem.
The starting point would be Postgres logs located in $PGDATA/../pg_log
, SELECT * FROM pg_replication_slots
, and ps auxwf
output.
How can I figure out the max disk size pg_wal would cost? @CyberDem0n
Hi for anyone landing here figureing out why their database cluster is running out of space.
executing du -h -d 4
showed me that
the wal folder got realy large ./pgdata/pgroot/data/pg_wal
. The reason for that is because replication nodes where not longer healthy and catching up.
I solvend it by executing on a node: patronictl reinit
and selecting the unhealty replicas. You can see the status of replicas with patronictl list
. When the nodes where healty again my wal folder size when from 144gb to near zero on the database nodes.
@spreeker : I agree that procedure works. But since these are containers, unless you monitor the disk usage of the pods/containers and the cluster status, you will not be able to take such actions. It may lead to data loss as well.
Yep very inconvenient.
Hello,
I am using the base minimal cluster with one master and one worker node and using the v1.7.1 version of the operator with default settings.
Currently I am noticing that the pg_wal file directory grows quite a lot in some cases and eats up space really fast from several GB per day without cleanup.
Is there a way to reduce WAL files on the minimal cluster or this is an expected functionality.
As I understand backing up the cluster should automatically clear the WAL files but sometimes they stay for days and in other cases they fill up the storage for example for 1-2GB increase per day for a database with size ~400MB.
Please let me know if I might be missing something or the size of the WAL files is expected to grow in time.
Thanks
I have also answered the required questions for debugging:
Please, answer some short questions which should help us to understand your problem / question better?
Which image of the operator are you using? e.g. registry.opensource.zalan.do/acid/postgres-operator:v1.7.1
Where do you run it - cloud or metal? Kubernetes or OpenShift? [ minikube, kind and op-prem install with kubeadm
Are you running Postgres Operator in production?
yes
Type of issue? [Bug report, question, feature request, etc.] question
Some general remarks when posting a bug report: