postgrespro / pg_probackup

Backup and recovery manager for PostgreSQL
https://postgrespro.github.io/pg_probackup/
Other
711 stars 86 forks source link

Race condition #521

Open artemsergienko opened 2 years ago

artemsergienko commented 2 years ago

First backup by cron has been failed.

Second backup started by manually has been successful.

Looks like race condition between writer and reader.

Please check.

2022-08-15 02:20:01 +05 [413459]: INFO: command: pg_probackup set-config --instance snph --log-level-file=INFO 
2022-08-15 02:20:01 +05 [413490]: INFO: command: pg_probackup backup --instance snph --stream --threads 5 --compress -b PTRACK --delete-expired --delete-wal 
2022-08-15 02:20:01 +05 [413490]: INFO: Backup start, pg_probackup version: 2.5.5, instance: snph, backup ID: RGMJ9D, backup mode: PTRACK, wal mode: STREAM, remote: false, compress-algorithm: zlib, compress-level: 1
2022-08-15 02:20:01 +05 [413490]: WARNING: Current PostgreSQL role is superuser. It is not recommended to run pg_probackup under superuser.
2022-08-15 02:20:01 +05 [413490]: INFO: Backup RGMJ9D is going to be taken from standby
2022-08-15 02:20:01 +05 [413490]: INFO: wait for pg_start_backup()
2022-08-15 02:20:01 +05 [413490]: INFO: Parent backup: RGKOLD
2022-08-15 02:20:03 +05 [413490]: INFO: Wait for WAL segment /backups/snph/RGMJ9D/database/pg_wal/000000010000003E00000021 to be streamed
2022-08-15 02:20:03 +05 [413490]: INFO: PGDATA size: 14GB
2022-08-15 02:20:03 +05 [413490]: INFO: Extracting pagemap of changed blocks
2022-08-15 02:20:03 +05 [413490]: INFO: Pagemap successfully extracted, time elapsed: 0 sec
2022-08-15 02:20:03 +05 [413490]: INFO: Start transferring data files
2022-08-15 02:20:14 +05 [413490]: INFO: Data files are transferred, time elapsed: 11s
2022-08-15 02:20:15 +05 [413490]: INFO: wait for pg_stop_backup()
2022-08-15 02:20:15 +05 [413490]: INFO: pg_stop backup() successfully executed
2022-08-15 02:20:16 +05 [413490]: WARNING: Could not read WAL record at 3E/8547A000: invalid magic number 0000 in log segment 000000000000003E00000021, offset 21471232
2022-08-15 02:20:16 +05 [413490]: ERROR: Thread [0]: Possible WAL corruption. Error has occured during reading WAL segment "/backups/snph/RGMJ9D/database/pg_wal/000000010000003E00000021"
2022-08-15 02:20:16 +05 [413490]: ERROR: Thread [0]: Possible WAL corruption. Error has occured during reading WAL segment "/backups/snph/RGMJ9D/database/pg_wal/000000010000003E00000021"
2022-08-15 02:20:16 +05 [413490]: WARNING: Backup RGMJ9D is running, setting its status to ERROR
2022-08-15 11:18:55 +05 [524925]: INFO: command: pg_probackup set-config --instance snph --log-level-file=INFO 
2022-08-15 11:18:55 +05 [524932]: INFO: command: pg_probackup backup --instance snph --stream --threads 5 --compress -b PTRACK --delete-expired --delete-wal 
2022-08-15 11:18:55 +05 [524932]: INFO: Backup start, pg_probackup version: 2.5.5, instance: snph, backup ID: RGN87J, backup mode: PTRACK, wal mode: STREAM, remote: false, compress-algorithm: zlib, compress-level: 1
2022-08-15 11:18:55 +05 [524932]: WARNING: Current PostgreSQL role is superuser. It is not recommended to run pg_probackup under superuser.
2022-08-15 11:18:55 +05 [524932]: INFO: Backup RGN87J is going to be taken from standby
2022-08-15 11:18:55 +05 [524932]: INFO: wait for pg_start_backup()
2022-08-15 11:18:55 +05 [524932]: WARNING: Backup RGMJ9D has status: ERROR. Cannot be a parent.
2022-08-15 11:18:55 +05 [524932]: INFO: Parent backup: RGKOLD
2022-08-15 11:18:57 +05 [524932]: INFO: Wait for WAL segment /backups/snph/RGN87J/database/pg_wal/000000010000003E00000026 to be streamed
2022-08-15 11:18:57 +05 [524932]: INFO: PGDATA size: 14GB
2022-08-15 11:18:57 +05 [524932]: INFO: Extracting pagemap of changed blocks
2022-08-15 11:18:57 +05 [524932]: INFO: Pagemap successfully extracted, time elapsed: 0 sec
2022-08-15 11:18:57 +05 [524932]: INFO: Start transferring data files
2022-08-15 11:19:07 +05 [524932]: INFO: Data files are transferred, time elapsed: 10s
2022-08-15 11:19:07 +05 [524932]: INFO: pg_stop backup() successfully executed
2022-08-15 11:19:07 +05 [524932]: WARNING: Could not read WAL record at 3E/9A5B9518: invalid record length at 3E/9A5B9518: wanted 32, got 0
2022-08-15 11:19:08 +05 [524932]: INFO: Wait for LSN 3E/9A5B9518 in streamed WAL segment /backups/snph/RGN87J/database/pg_wal/000000010000003E00000026
2022-08-15 11:19:10 +05 [524932]: INFO: Syncing backup files to disk
2022-08-15 11:19:11 +05 [524932]: INFO: Backup files are synced, time elapsed: 1s
2022-08-15 11:19:11 +05 [524932]: INFO: Validating backup RGN87J
2022-08-15 11:19:18 +05 [524932]: INFO: Backup RGN87J data files are valid
2022-08-15 11:19:18 +05 [524932]: INFO: Backup RGN87J resident size: 2559MB
2022-08-15 11:19:18 +05 [524932]: INFO: Backup RGN87J completed
2022-08-15 11:19:18 +05 [524932]: INFO: Evaluate backups by retention
2022-08-15 11:19:18 +05 [524932]: INFO: Backup RGN87J, mode: PTRACK, status: OK. Redundancy: 1/1, Time Window: 0d/0d. Active
2022-08-15 11:19:18 +05 [524932]: INFO: Backup RGMJ9D, mode: PTRACK, status: ERROR. Redundancy: 1/1, Time Window: 0d/0d. Active
2022-08-15 11:19:18 +05 [524932]: INFO: Backup RGKOLD, mode: FULL, status: OK. Redundancy: 1/1, Time Window: 1d/0d. Active
2022-08-15 11:19:18 +05 [524932]: INFO: On timeline 1 WAL segments between 000000010000003E00000013 and 000000010000003E00000026 will be removed
2022-08-15 11:19:18 +05 [524932]: INFO: Logical WAL size to remove on timeline 1 : 1216MB
2022-08-15 11:19:18 +05 [524932]: INFO: Resident WAL size to free on timeline 1 : 486MB
2022-08-15 11:19:18 +05 [524932]: INFO: There are no backups to merge by retention policy
2022-08-15 11:19:18 +05 [524932]: INFO: There are no backups to delete by retention policy
fukanchik commented 1 year ago

Артём, эта проблема всё ещё воспроизводится?