Closed kkarlo closed 1 month ago
Hello @kkarlo
Error communicating with DCS
demoting self because DCS is not accessible and I was a leader
Patroni demoted the leader (Primary) to a replica role (restarted Postgres) due to communication problems with DCS (etcd). \ This is due to the saturation of resources on the database server (CPU, disks, network). Check the monitoring data to identify the bottleneck.
Also, try reduce the value of the 'process-max
' parameter (PgBackRest) to reduce the load during the backup.
Hi @vitabaks Thank you for a quick response. Now I have process-max set to 4. After setting up this parameter lower, how to distribute settings on pgbackrest hosts? Are there some tags for it?
Thaks in advance!
Hi @vitabaks I pushed backup +5 mins, it is doing properly. But still from time to time I get this "received fast shutdown" on database. Logs in patroni adn postgresql are the same. There is no other jobs executed hourly.
And see your monitoring system to identify the bottleneck during backup.
Hi, I have some issue with postgresql cluster. It's created by this role with etcd, patroni, pgbackrest,pgbouncer, haproxy and keepalived. I have full backup at midnight, and diff from 1-23. And at some random full hour XX:00 i get restart of the database. Logs below: pgbackrest:
Patroni:
Postgresql:
So, i do not have a backup, and database is restarting. This lasts just a few seconds (up to 15s), but this is exact time when db backup is corrupted, and ended abnormally. It's occurs on random full hour. Some workaround i have: backup is run +5min, but getting "received fast shutdown request" still occurs in cluster. Can you provide me with some information? Maybe this cluster is just working well, and I am just overreacting, but this do not allow me to have not corrupted backups at full hour.