consistent fails of logical backup

Please, answer some short questions which should help us to understand your problem / question better?

Which image of the operator are you using? ghcr.io/zalando/postgres-operator:v1.13.0
Where do you run it - cloud or metal? Kubernetes or OpenShift? Bare Metal K3s
Are you running Postgres Operator in production? yes
Type of issue? question / feature request

Since we moved our production database to this operator, the logical backup hasn't succeeded. We get following error:

pg_dump: error: Dumping the contents of table "courses" failed: PQgetResult() failed.
pg_dump: detail: Error message from server: ERROR:  canceling statement due to conflict with recovery
DETAIL:  User query might have needed to see row versions that must be removed.

The table is not always the same, but it is always a table where there's lots of activity.

We figured this error is due to data getting deleted from the replica (we have 1 master and 2 replicas) and pg_dump trying to still copy it, or something like that; we aren't experts in Postgres replication unfortunately. We are aware there are certain postgres parameters which we can set to try to avoid this, but we aren't sure which one, and we don't want the hassle of finding it out just to fix a cronjob which runs one time a day.

As the cronjob runs at night, when our load is fairly low (but still high enough to let the backup fail), we would like to use the master for this, as such problems do not arise then. Now, this is hardcoded in the backup script to use the replicas when possible.

I therefore propose an extra config parameter to force the usage of the master node. If there are other suggestions which are easier to implement or just a recommendation on what parameter we should use, that would also be great.

zalando / postgres-operator

consistent fails of logical backup #2746