As you can see on that patch, I just shutdown PostgreSQL, remove recovery.conf (I could also rename it, but we recreate it as needed, so...) and start it again, the idea is to avoid a timeline switch, and almost eliminate the need for a shared archive (provided standbys are "close enough" to the newly promoted one).
I know the extra WAL shipping will work, but it has the disadvantage that it will ship the WAL files twice (once through streaming replication, and again through WAL shipping). We may have an extra cold-standby around as a WAL archive (in addition to a WAL archive on the PRI), and have the HS use that archive, but it is a more complex setup.
Using this promotion method allows you to have the archive if you want, in order to allow really lagged standbys to catch-up, but it is not that important (remember the timeline switch requires the .history file(s) to be available, and these are not streamed, so, if we allow the timeline switch, the WAL archive becomes increasingly important, even if a standby is lagged by just 1 WAL segment).
Greetings,
I have actually some code for this, but wanted to open a discussion on it first, code is here:
https://github.com/soulhunter/resource-agents/commit/64a10f265566396ddf6f7bc6b07a5e6eaa2efe51
As you can see on that patch, I just shutdown PostgreSQL, remove recovery.conf (I could also rename it, but we recreate it as needed, so...) and start it again, the idea is to avoid a timeline switch, and almost eliminate the need for a shared archive (provided standbys are "close enough" to the newly promoted one).
I know the extra WAL shipping will work, but it has the disadvantage that it will ship the WAL files twice (once through streaming replication, and again through WAL shipping). We may have an extra cold-standby around as a WAL archive (in addition to a WAL archive on the PRI), and have the HS use that archive, but it is a more complex setup.
Using this promotion method allows you to have the archive if you want, in order to allow really lagged standbys to catch-up, but it is not that important (remember the timeline switch requires the .history file(s) to be available, and these are not streamed, so, if we allow the timeline switch, the WAL archive becomes increasingly important, even if a standby is lagged by just 1 WAL segment).
What do you think?
Ildefonso Camargo