Open gsl23 opened 2 months ago
At 2024-05-29 12:44 standby instance have been promoted to primary :
2024-05-29 12:44:17.079 +05 [33061] LOG: replication terminated by primary server
2024-05-29 12:44:17.079 +05 [33061] DETAIL: End of WAL reached on timeline 10 at 0/370000A0.
2024-05-29 12:44:17.079 +05 [33061] FATAL: could not send end-of-streaming message to primary: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
no COPY in progress
cp: cannot stat '/pgsql/data/../wal_archive/0000000B.history': No such file or directory
cp: cannot stat '/pgsql/data/../wal_archive/0000000A0000000000000037': No such file or directory
2024-05-29 12:44:17.087 +05 [33048] LOG: invalid resource manager ID 114 at 0/370000A0
2024-05-29 12:44:17.109 +05 [350162] FATAL: could not connect to the primary server: connection to server at "192.168.130.23", port 5432 failed: FATAL: the
cp: cannot stat '/pgsql/data/../wal_archive/0000000B.history': No such file or directory
2024-05-29 12:44:17.112 +05 [33048] LOG: waiting for WAL to become available at 0/370000B8
2024-05-29 12:44:17 +05 [350158]: [1-1]: INFO: pg_probackup archive-push WAL file: 0000000A0000000000000036, threads: 1/1, batch: 1/1, compression: zlib
2024-05-29 12:44:17 +05 [350158]: [1-1]: INFO: pg_probackup archive-push completed successfully, pushed: 1, skipped: 0, time elapsed: 66ms
cp: cannot stat '/pgsql/data/../wal_archive/0000000A0000000000000037': No such file or directory
2024-05-29 12:44:17.887 +05 [33048] LOG: received promote request
2024-05-29 12:44:17.887 +05 [33048] LOG: redo done at 0/37000028 system usage: CPU: user: 0.75 s, system: 0.83 s, elapsed: 63143.22 s
2024-05-29 12:44:17.887 +05 [33048] LOG: last completed transaction was at log time 2024-05-29 12:44:00.017459+05
cp: cannot stat '/pgsql/data/../wal_archive/0000000A0000000000000037': No such file or directory
cp: cannot stat '/pgsql/data/../wal_archive/0000000B.history': No such file or directory
2024-05-29 12:44:17.898 +05 [33048] LOG: selected new timeline ID: 11
cp: cannot stat '/pgsql/data/../wal_archive/0000000A.history': No such file or directory
2024-05-29 12:44:18.037 +05 [33048] LOG: archive recovery complete
2024-05-29 12:44:18.068 +05 [33046] LOG: checkpoint starting: force
2024-05-29 12:44:18.075 +05 [33042] LOG: database system is ready to accept connections
2024-05-29 12:44:18.082 +05 [350180] LOG: pg_cron scheduler started
2024-05-29 12:44:18 +05 [350176]: [1-1]: INFO: pg_probackup archive-push WAL file: 0000000B.history, threads: 1/1, batch: 1/1, compression: zlib
2024-05-29 12:44:18 +05 [350176]: [1-1]: INFO: pg_probackup archive-push completed successfully, pushed: 1, skipped: 0, time elapsed: 6ms
2024-05-29 12:44:18 +05 [350187]: [1-1]: INFO: pg_probackup archive-push WAL file: 0000000A0000000000000037.partial, threads: 1/1, batch: 1/1, compression: zl
2024-05-29 12:44:18 +05 [350187]: [1-1]: INFO: pg_probackup archive-push completed successfully, pushed: 1, skipped: 0, time elapsed: 78ms
2024-05-29 12:44:19.011 +05 [33046] LOG: checkpoint complete: wrote 34 buffers (0.0%); 0 WAL file(s) added, 0 removed, 1 recycled; write=0.897 s, sync=0.011
2024-05-29 12:44:19.011 +05 [33046] LOG: checkpoint starting: immediate force wait
2024-05-29 12:44:19.060 +05 [33046] LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.009 s, sync=0.001 s
2024-05-29 12:45:00.007 +05 [350180] LOG: cron job 1 starting:
Do we have any chances to apply later wals after this occasion ?
Looks like it is pg_probackup archive-get problem.
I copied all needed wals manually to server local dir and setted up restore by cp :
# restore_command = '"/usr/bin/pg_probackup-15" archive-get -B "/backup/postgres" --instance "pgw-db-p02" --wal-file-path=%p --wal-file-name=%f'
restore_command = 'cp /pgsql/data/../wal_archive2/%f %p'
And now its recovered forward after 0000000B0000000000000037 without any problems:
2024-05-29 17:32:07.935 +05 [439475] LOG: restored log file "0000000A0000000000000032" from archive
2024-05-29 17:32:08.025 +05 [439475] LOG: restored log file "0000000A0000000000000033" from archive
2024-05-29 17:32:08.115 +05 [439475] LOG: restored log file "0000000A0000000000000034" from archive
2024-05-29 17:32:08.193 +05 [439475] LOG: restored log file "0000000A0000000000000035" from archive
2024-05-29 17:32:08.271 +05 [439475] LOG: restored log file "0000000A0000000000000036" from archive
2024-05-29 17:32:08.355 +05 [439475] LOG: restored log file "0000000B0000000000000037" from archive
2024-05-29 17:32:08.436 +05 [439475] LOG: restored log file "0000000B0000000000000038" from archive
2024-05-29 17:32:08.528 +05 [439475] LOG: restored log file "0000000B0000000000000039" from archive
2024-05-29 17:32:08.606 +05 [439475] LOG: restored log file "0000000B000000000000003A" from archive
2024-05-29 17:32:08.693 +05 [439475] LOG: restored log file "0000000B000000000000003B" from archive
2024-05-29 17:32:08.770 +05 [439475] LOG: restored log file "0000000B000000000000003C" from archive
2024-05-29 17:32:08.865 +05 [439475] LOG: restored log file "0000000B000000000000003D" from archive
2024-05-29 17:32:08.946 +05 [439475] LOG: restored log file "0000000B000000000000003E" from archive
2024-05-29 17:32:09.011 +05 [439475] LOG: recovery stopping before commit of transaction 8004, time 2024-05-29 15:46:00.001085+05
2024-05-29 17:32:09.011 +05 [439475] LOG: pausing at the end of recovery
2024-05-29 17:32:09.011 +05 [439475] HINT: Execute pg_wal_replay_resume() to promote.
2024-05-29 17:33:27.613 +05 [439475] LOG: redo done at 0/3E00FB88 system usage: CPU: user: 0.02 s, system: 0.20 s, elapsed: 81.25 s
2024-05-29 17:33:27.613 +05 [439475] LOG: last completed transaction was at log time 2024-05-29 15:45:00.024303+05
So, my test case :
Hi guys ! I'm testing wal archive recovery and cant to understand why archive-get just lose some last wals :
postgresql.auto.conf:
wals in pg_probackup archive :
postgresql.log while recovery :
Why it is ended on 0000000A0000000000000037 last completed transaction was at log time 2024-05-29 12:44:00.017459+05 ?? Where is -rw-------. 1 postgres postgres 79741 May 29 13:18 0000000B0000000000000038.gz -rw-------. 1 postgres postgres 83013 May 29 13:25 0000000B0000000000000039.gz etc.... ???