postgrespro / pg_probackup

Backup and recovery manager for PostgreSQL
https://postgrespro.github.io/pg_probackup/
Other
703 stars 86 forks source link

ERROR: WAL segment ... could not be streamed in 300 seconds #624

Closed vd0v1n closed 1 week ago

vd0v1n commented 2 months ago

Сервер бэкапов: pg_probackup-16 2.5.15 (PostgreSQL 16.2) Сервер БД с ванильным PG: postgresql16-server.x86_64 16.3-1PGDG.rhel8 @pgdg16

Бэкап снимается с физической реплики с настройками:

hot_standby = 'on'
wal_level = 'logical'
archive_mode = 'on'
archive_command = '/bin/true'

командами:

/bin/pg_probackup-16 backup -B /mnt/data/16 -b FULL --compress --instance=db --stream --temp-slot --remote-host=db.lan --remote-user=postgres -U backup -d backupdb -j 4 --delete-expired --delete-wal --retention-redundancy=1 --log-level-console=error

/bin/pg_probackup-16 backup -B /mnt/data/16 -b DELTA --compress --instance=db --stream --temp-slot --remote-host=db.lan --remote-user=postgres -U backup -d backupdb -j 4 --log-level-console=error

Фулл и дельта от случая к случаю выполняются либо успешно, либо с ошибкой: ERROR: WAL segment ... could not be streamed in 300 seconds

pg_probackup-16 show -B /mnt/data/16 --instance=db
 Instance     Version  ID      Recovery Time           Mode   WAL Mode  TLI    Time    Data   WAL  Zratio  Start LSN       Stop LSN        Status
 db  16       SF3EK1  ----                    DELTA  STREAM    1/1  1h:37m    70GB     0    3.05  39441/C54F5950  0/0             ERROR  
 db  16       SF1JW1  2024-06-14 03:39:48+03  DELTA  STREAM    1/1  1h:42m    92GB  50GB    3.09  393CE/7FD25080  393DA/C19D8F08  OK     
 db  16       SEY9W2  2024-06-12 16:57:52+03  FULL   STREAM    1/0  9h:26m  1980GB  94GB    2.57  39320/C57DC250  39338/50250770  OK

Пример полного лога:

pg_probackup-16 backup \
>     -B /mnt/data/16 \
>     -b DELTA \
>     --compress \
>     --instance=db \
>     --stream \
>     --temp-slot \
>     --remote-host=db.lan \
>     --remote-user=postgres \
>     -U backup \
>     -d backupdb \
>     -j 4 \
>     --no-validate
INFO: Backup start, pg_probackup version: 2.5.15, instance: db, backup ID: SFF2OY, backup mode: DELTA, wal mode: STREAM, remote: true, compress-algorit
hm: zlib, compress-level: 1
WARNING: This PostgreSQL instance was initialized without data block checksums. pg_probackup have no way to detect data block corruption without them. Reinitial
ize PGDATA with option '--data-checksums'.
INFO: Backup SFF2OY is going to be taken from standby
INFO: Database backup start
INFO: wait for pg_backup_start()
WARNING: Backup SFEIK1 has status: ERROR. Cannot be a parent.
INFO: Parent backup: SFCNW2
INFO: Wait for WAL segment /mnt/data/16/backups/db/SFF2OY/database/pg_wal/0000000100039777000000F0 to be streamed
INFO: PGDATA size: 4964GB
INFO: Current Start LSN: 39777/F0218260, TLI: 1
INFO: Parent Start LSN: 3966B/F1366E58, TLI: 1
INFO: Start transferring data files
INFO: Data files are transferred, time elapsed: 2h:24m
INFO: wait for pg_stop_backup()
INFO: pg_stop backup() successfully executed
INFO: stop_lsn: 397E7/6E641578
INFO: Wait for LSN 397E7/6E641578 in streamed WAL segment /mnt/data/16/backups/db/SFF2OY/database/pg_wal/00000001000397E70000006E
ERROR: WAL segment 00000001000397E70000006E could not be streamed in 300 seconds
WARNING: Backup SFF2OY is running, setting its status to ERROR
vd0v1n commented 1 week ago

https://github.com/postgrespro/pg_probackup/issues/430