Open artemsergienko opened 2 years ago
First backup by cron has been failed.
Second backup started by manually has been successful.
Looks like race condition between writer and reader.
Please check.
2022-08-15 02:20:01 +05 [413459]: INFO: command: pg_probackup set-config --instance snph --log-level-file=INFO 2022-08-15 02:20:01 +05 [413490]: INFO: command: pg_probackup backup --instance snph --stream --threads 5 --compress -b PTRACK --delete-expired --delete-wal 2022-08-15 02:20:01 +05 [413490]: INFO: Backup start, pg_probackup version: 2.5.5, instance: snph, backup ID: RGMJ9D, backup mode: PTRACK, wal mode: STREAM, remote: false, compress-algorithm: zlib, compress-level: 1 2022-08-15 02:20:01 +05 [413490]: WARNING: Current PostgreSQL role is superuser. It is not recommended to run pg_probackup under superuser. 2022-08-15 02:20:01 +05 [413490]: INFO: Backup RGMJ9D is going to be taken from standby 2022-08-15 02:20:01 +05 [413490]: INFO: wait for pg_start_backup() 2022-08-15 02:20:01 +05 [413490]: INFO: Parent backup: RGKOLD 2022-08-15 02:20:03 +05 [413490]: INFO: Wait for WAL segment /backups/snph/RGMJ9D/database/pg_wal/000000010000003E00000021 to be streamed 2022-08-15 02:20:03 +05 [413490]: INFO: PGDATA size: 14GB 2022-08-15 02:20:03 +05 [413490]: INFO: Extracting pagemap of changed blocks 2022-08-15 02:20:03 +05 [413490]: INFO: Pagemap successfully extracted, time elapsed: 0 sec 2022-08-15 02:20:03 +05 [413490]: INFO: Start transferring data files 2022-08-15 02:20:14 +05 [413490]: INFO: Data files are transferred, time elapsed: 11s 2022-08-15 02:20:15 +05 [413490]: INFO: wait for pg_stop_backup() 2022-08-15 02:20:15 +05 [413490]: INFO: pg_stop backup() successfully executed 2022-08-15 02:20:16 +05 [413490]: WARNING: Could not read WAL record at 3E/8547A000: invalid magic number 0000 in log segment 000000000000003E00000021, offset 21471232 2022-08-15 02:20:16 +05 [413490]: ERROR: Thread [0]: Possible WAL corruption. Error has occured during reading WAL segment "/backups/snph/RGMJ9D/database/pg_wal/000000010000003E00000021" 2022-08-15 02:20:16 +05 [413490]: ERROR: Thread [0]: Possible WAL corruption. Error has occured during reading WAL segment "/backups/snph/RGMJ9D/database/pg_wal/000000010000003E00000021" 2022-08-15 02:20:16 +05 [413490]: WARNING: Backup RGMJ9D is running, setting its status to ERROR 2022-08-15 11:18:55 +05 [524925]: INFO: command: pg_probackup set-config --instance snph --log-level-file=INFO 2022-08-15 11:18:55 +05 [524932]: INFO: command: pg_probackup backup --instance snph --stream --threads 5 --compress -b PTRACK --delete-expired --delete-wal 2022-08-15 11:18:55 +05 [524932]: INFO: Backup start, pg_probackup version: 2.5.5, instance: snph, backup ID: RGN87J, backup mode: PTRACK, wal mode: STREAM, remote: false, compress-algorithm: zlib, compress-level: 1 2022-08-15 11:18:55 +05 [524932]: WARNING: Current PostgreSQL role is superuser. It is not recommended to run pg_probackup under superuser. 2022-08-15 11:18:55 +05 [524932]: INFO: Backup RGN87J is going to be taken from standby 2022-08-15 11:18:55 +05 [524932]: INFO: wait for pg_start_backup() 2022-08-15 11:18:55 +05 [524932]: WARNING: Backup RGMJ9D has status: ERROR. Cannot be a parent. 2022-08-15 11:18:55 +05 [524932]: INFO: Parent backup: RGKOLD 2022-08-15 11:18:57 +05 [524932]: INFO: Wait for WAL segment /backups/snph/RGN87J/database/pg_wal/000000010000003E00000026 to be streamed 2022-08-15 11:18:57 +05 [524932]: INFO: PGDATA size: 14GB 2022-08-15 11:18:57 +05 [524932]: INFO: Extracting pagemap of changed blocks 2022-08-15 11:18:57 +05 [524932]: INFO: Pagemap successfully extracted, time elapsed: 0 sec 2022-08-15 11:18:57 +05 [524932]: INFO: Start transferring data files 2022-08-15 11:19:07 +05 [524932]: INFO: Data files are transferred, time elapsed: 10s 2022-08-15 11:19:07 +05 [524932]: INFO: pg_stop backup() successfully executed 2022-08-15 11:19:07 +05 [524932]: WARNING: Could not read WAL record at 3E/9A5B9518: invalid record length at 3E/9A5B9518: wanted 32, got 0 2022-08-15 11:19:08 +05 [524932]: INFO: Wait for LSN 3E/9A5B9518 in streamed WAL segment /backups/snph/RGN87J/database/pg_wal/000000010000003E00000026 2022-08-15 11:19:10 +05 [524932]: INFO: Syncing backup files to disk 2022-08-15 11:19:11 +05 [524932]: INFO: Backup files are synced, time elapsed: 1s 2022-08-15 11:19:11 +05 [524932]: INFO: Validating backup RGN87J 2022-08-15 11:19:18 +05 [524932]: INFO: Backup RGN87J data files are valid 2022-08-15 11:19:18 +05 [524932]: INFO: Backup RGN87J resident size: 2559MB 2022-08-15 11:19:18 +05 [524932]: INFO: Backup RGN87J completed 2022-08-15 11:19:18 +05 [524932]: INFO: Evaluate backups by retention 2022-08-15 11:19:18 +05 [524932]: INFO: Backup RGN87J, mode: PTRACK, status: OK. Redundancy: 1/1, Time Window: 0d/0d. Active 2022-08-15 11:19:18 +05 [524932]: INFO: Backup RGMJ9D, mode: PTRACK, status: ERROR. Redundancy: 1/1, Time Window: 0d/0d. Active 2022-08-15 11:19:18 +05 [524932]: INFO: Backup RGKOLD, mode: FULL, status: OK. Redundancy: 1/1, Time Window: 1d/0d. Active 2022-08-15 11:19:18 +05 [524932]: INFO: On timeline 1 WAL segments between 000000010000003E00000013 and 000000010000003E00000026 will be removed 2022-08-15 11:19:18 +05 [524932]: INFO: Logical WAL size to remove on timeline 1 : 1216MB 2022-08-15 11:19:18 +05 [524932]: INFO: Resident WAL size to free on timeline 1 : 486MB 2022-08-15 11:19:18 +05 [524932]: INFO: There are no backups to merge by retention policy 2022-08-15 11:19:18 +05 [524932]: INFO: There are no backups to delete by retention policy
Артём, эта проблема всё ещё воспроизводится?
First backup by cron has been failed.
Second backup started by manually has been successful.
Looks like race condition between writer and reader.
Please check.