Closed heysky closed 22 hours ago
The thing is that Postgres has crashed, and we never promised to execute callbacks on crash. Also, it is not clear what happened after the crash (since you didn't provide enough logs). I can guess that Patroni first tried to just start postgres back. If you provide some logs from the former primary right after the crash and up to the moment when the standby was promoted it would really help to make a decision how to deal with it.
I'll upload the full logs shortly.
What happened?
When the primary server's disk is full, Patroni successfully demoted primary database and promoted standby database but the callback script was not executed on the original primary server.
After investigation, in follow function of
patroni/postgresql/__init__.py
, patroni tries to start database before executing callback script:ret = self.start(timeout=timeout, block_callbacks=change_role, role=role) or None
and in start function, it writes some parameters to configration file:
self.config.write_postgresql_conf(configuration)
As the disk is full, it fails to write configration file with an IO error, and the remaining steps are not executed.
Now that the role has already been changed from master to replica, the callback script should be executed. Can it be fixed please?
How can we reproduce it (as minimally and precisely as possible)?
Use dd command to fill up the postgresql data directory and run checkpoint on primary database to crash the database.
What did you expect to happen?
Now that the primary database failed to start and the role was demoted to replica, callback script should be executed as well.
Patroni/PostgreSQL/DCS version
Patroni configuration file
patronictl show-config
Patroni log files
PostgreSQL log files
Have you tried to use GitHub issue search?
Anything else we need to know?
No response