ossc-db / pg_rman

Backup and restore management tool for PostgreSQL
http://ossc-db.github.io/pg_rman/index.html
Other
476 stars 77 forks source link

ERROR: switched WAL could not be archived in 10 seconds #227

Closed MasonXon closed 2 years ago

MasonXon commented 2 years ago

By viewing historical issues, I cannot found the answer for this problem, I put my configuration:

restore_command = 'cp ./pg_archivelog/%f %p'
archive_mode = 'on'
archive_timeout = '120'
archive_command = 'cp %p /var/lib/pgsql/12/data/pg_archivelog/%f'

I'm sure the mode archive is all right, because I tested pg_basebackup for PITR before.

ARCLOG_PATH='/var/lib/pgsql/12/data/pg_archivelog'
SRVLOG_PATH='/var/lib/pgsql/12/data/log'

operations init worked normal, but backup occurs following:

[postgres@pgsql ~]$ /usr/pgsql-12/bin/pg_rman -B /var/lib/pgsql/backups -b full --hard-copy backup -P
INFO: copying database files
Processed 5012 of 5012 files, skipped 0
ERROR: switched WAL could not be archived in 10 seconds

I have tried restarting, modifying database timeout parameters, etc., but none of them seem to work.

mikecaat commented 2 years ago

Hi, could you share the bellow results?

  1. execute with --debug option
# /usr/pgsql-12/bin/pg_rman -B /var/lib/pgsql/backups -b full --hard-copy backup -P --debug
  1. check the archive status after the error is happened
# ls -lR $PGDATA/pg_wal
# ls -lR /var/lib/pgsql/12/data/pg_archivelog/
MasonXon commented 2 years ago

before 1:

[postgres@pgsql data]$ ls -lR $PGDATA/pg_wal
/var/lib/pgsql/12/data/pg_wal:
total 393232
-rw-------. 1 postgres postgres       35 May 19 12:54 00000002.history
-rw-------. 1 postgres postgres       80 May 19 14:02 00000003.history
-rw-------. 1 postgres postgres      347 May 23 15:25 00000004000000010000003B.00000028.backup
-rw-------. 1 postgres postgres 16777216 May 23 19:15 00000004000000010000003F
-rw-------. 1 postgres postgres 16777216 May 19 15:55 000000040000000100000040
-rw-------. 1 postgres postgres 16777216 May 19 15:55 000000040000000100000041
-rw-------. 1 postgres postgres 16777216 May 19 15:55 000000040000000100000042
-rw-------. 1 postgres postgres 16777216 May 19 15:55 000000040000000100000043
-rw-------. 1 postgres postgres 16777216 May 19 15:55 000000040000000100000044
-rw-------. 1 postgres postgres 16777216 May 19 15:55 000000040000000100000045
-rw-------. 1 postgres postgres 16777216 May 19 15:55 000000040000000100000046
-rw-------. 1 postgres postgres 16777216 May 19 15:55 000000040000000100000047
-rw-------. 1 postgres postgres 16777216 May 19 15:55 000000040000000100000048
-rw-------. 1 postgres postgres 16777216 May 19 15:56 000000040000000100000049
-rw-------. 1 postgres postgres 16777216 May 19 15:55 00000004000000010000004A
-rw-------. 1 postgres postgres 16777216 May 19 15:54 00000004000000010000004B
-rw-------. 1 postgres postgres 16777216 May 19 15:54 00000004000000010000004C
-rw-------. 1 postgres postgres 16777216 May 19 15:54 00000004000000010000004D
-rw-------. 1 postgres postgres 16777216 May 19 15:54 00000004000000010000004E
-rw-------. 1 postgres postgres 16777216 May 19 15:54 00000004000000010000004F
-rw-------. 1 postgres postgres 16777216 May 19 15:54 000000040000000100000050
-rw-------. 1 postgres postgres 16777216 May 19 15:54 000000040000000100000051
-rw-------. 1 postgres postgres 16777216 May 19 15:54 000000040000000100000052
-rw-------. 1 postgres postgres 16777216 May 19 15:54 000000040000000100000053
-rw-------. 1 postgres postgres 16777216 May 19 15:55 000000040000000100000054
-rw-------. 1 postgres postgres 16777216 May 19 15:55 000000040000000100000055
-rw-------. 1 postgres postgres 16777216 May 19 15:55 000000040000000100000056
-rw-------. 1 postgres postgres      125 May 19 14:02 00000004.history
drwx------. 2 postgres postgres      146 May 23 19:14 archive_status

/var/lib/pgsql/12/data/pg_wal/archive_status:
total 0
-rw-------. 1 postgres postgres 0 May 19 10:07 00000002.history.done
-rw-------. 1 postgres postgres 0 May 19 12:54 00000003.history.done
-rw-------. 1 postgres postgres 0 May 23 15:25 00000004000000010000003B.00000028.backup.done
-rw-------. 1 postgres postgres 0 May 19 14:02 00000004.history.done

ls -lR $PGDATA/pg_wal|wc -l
39

ls -lR /var/lib/pgsql/12/data/pg_archivelog/|wc -l
72

backup log follow:

[postgres@pgsql data]$ /usr/pgsql-12/bin/pg_rman -B /var/lib/pgsql/backups -b full --hard-copy backup -P --debug
DEBUG: the initially configured target database : SYSTEM_IDENTIFIER = 7044337785934363742
DEBUG: the system identifier of current target database : 7044337785934363742
DEBUG: the backup target database is the same as initial configured one.
DEBUG: destination directories of backup are initialized
DEBUG: checking PostgreSQL server version
DEBUG: server version is 12.0.11
DEBUG: checking block size setting
DEBUG: (query) SELECT current_setting($1)
DEBUG:  (param:0) = block_size
DEBUG: block size is 8192
DEBUG: checking block size setting
DEBUG: (query) SELECT current_setting($1)
DEBUG:  (param:0) = wal_block_size
DEBUG: wal block size is 8192
DEBUG: data checksum disabled on the initially configured database
INFO: copying database files
DEBUG: executing pg_start_backup()
DEBUG: (query) SELECT * from pg_walfile_name_offset(pg_start_backup($1, $2, $3))
DEBUG:  (param:0) = 2022-05-23 19:20:12 with pg_rman
DEBUG:  (param:1) = true
DEBUG:  (param:2) = false
DEBUG: backup start point is (WAL file: 000000040000000100000040, xrecoff: 40)
DEBUG: taking full backup of database files
Processed 1348 of 1348 files, skipped 00
DEBUG: executing pg_stop_backup()
DEBUG: (query) SET client_min_messages = warning;
DEBUG: (query) SELECT * FROM pg_stop_backup($1)
DEBUG:  (param:0) = false
DEBUG: (query) SELECT * FROM pg_walfile_name_offset($1)
DEBUG:  (param:0) = 1/41000050
DEBUG: backup end point is (WAL file: 000000040000000100000041, xrecoff: 80)
DEBUG: waiting for 000000040000000100000041 is archived
DEBUG: (query) SELECT txid_current();
DEBUG: current XID is 1106
DEBUG: WAL file containing backup end point is archived after waiting for 0 seconds
INFO: copying archived WAL files
DEBUG: turn to take a full backup of archived WAL files
DEBUG: taking backup of archived WAL files
Processed 73 of 73 files, skipped 0
INFO: backup complete
INFO: Please execute 'pg_rman validate' to verify the files are correctly copied.
DEBUG: do not delete old archived WAL files
DEBUG: do not delete old backup files

It seems it's succeed, But why? before I execute the command it still failed

MasonXon commented 2 years ago

Now I'm so confused. but pg_rman is normal now.

mikecaat commented 2 years ago

Thanks for sharing the result.

Now I'm so confused. but pg_rman is normal now.

Though I can't say what happened exactly, #104 said that timing could cause the problem. If the archived status which pg_rman will check is deleted by CHECKPOINT, it can't know archiving is done or not.

Anyway, please let me know if the problem happen.

Thanks,

MasonXon commented 2 years ago

Okay,thanks

mikecaat commented 2 years ago

I'll close this issue. But please feel free to reopen.