sorintlab / stolon

PostgreSQL cloud native High Availability and more.
https://talk.stolon.io
Apache License 2.0
4.66k stars 447 forks source link

stolon-keper doesn't stop recovery after reaching recovery point #798

Open pyhalov opened 4 years ago

pyhalov commented 4 years ago

What happened: I've applied the following config:

 cat ~/stolon_restore.json 
{
         "initMode": "pitr",
         "synchronousReplication": true,
         "minSynchronousStandbys": 0,
         "maxSynchronousStandbys": 1,
         "pgParameters":{
            "log_statement":"all",
            "log_line_prefix":"%m [%r][%p]",
            "log_min_messages":"INFO",
            "wal_level":"logical",
            "max_prepared_transactions":"200",
         },
         "pitrConfig": { 
                "dataRestoreCommand": "cp -r /path/to/backup/* %d" , 
                "archiveRecoverySettings": {
                    "restoreCommand": "cp \"/path/to/backup/pg_wal/%f\" \"%p\""
                },
               "recoveryTargetSettings": { "recoveryTargetName": "_sharded_backup_1600524319"}
         } 
}

PostgreSQL recovered, but after recovery was restarted in recovery mode, tried to fetch new wal files, failed to reach recovery target and exited.


Sep 20 09:30:50 vg1 stolon-keeper[1383]: 2020-09-20 09:30:50.975 MSK [][1938]LOG:  database system is ready to accept connections
Sep 20 09:30:51 vg1 stolon-keeper[1383]: 2020-09-20T09:30:51.619+0300        INFO        cmd/keeper.go:1250        recovery completed
Sep 20 09:30:51 vg1 stolon-keeper[1383]: 2020-09-20 09:30:51.631 MSK [[local]][1959]LOG:  statement: select 1
Sep 20 09:30:51 vg1 stolon-keeper[1383]: 2020-09-20 09:30:51.637 MSK [[local]][1960]LOG:  statement: select 1 from information_schema.tables where table_schema = 'pg_catalog' and table_name = 'pg_file_settings'
Sep 20 09:30:51 vg1 stolon-keeper[1383]: 2020-09-20 09:30:51.645 MSK [[local]][1960]LOG:  statement: select name, setting from pg_file_settings where name IS NOT NULL and setting IS NOT NULL
Sep 20 09:30:51 vg1 stolon-keeper[1383]: 2020-09-20T09:30:51.657+0300        INFO        postgresql/postgresql.go:384        stopping database
Sep 20 09:30:51 vg1 stolon-proxy[1353]: 2020-09-20T09:30:51.658+0300        INFO        cmd/proxy.go:272        master address        {"address": "127.0.0.1:5443"}
Sep 20 09:30:51 vg1 stolon-keeper[1383]: 2020-09-20 09:30:51.659 MSK [][1938]LOG:  received fast shutdown request
Sep 20 09:30:51 vg1 stolon-keeper[1383]: 2020-09-20 09:30:51.659 MSK [][1938]LOG:  aborting any active transactions
Sep 20 09:30:51 vg1 stolon-keeper[1383]: 2020-09-20 09:30:51.661 MSK [][1938]LOG:  background worker "logical replication launcher" (PID 1958) exited with exit code 1
Sep 20 09:30:51 vg1 stolon-keeper[1383]: waiting for server to shut down....2020-09-20 09:30:51.672 MSK [][1947]LOG:  shutting down
Sep 20 09:30:51 vg1 stolon-proxy[1353]: 2020-09-20T09:30:51.689+0300        INFO        cmd/proxy.go:290        proxying to master address        {"address": "127.0.0.1:5443"}
Sep 20 09:30:51 vg1 stolon-keeper[1383]: 2020-09-20 09:30:51.703 MSK [][1938]LOG:  database system is shut down
Sep 20 09:30:51 vg1 stolon-keeper[1383]:  done
Sep 20 09:30:51 vg1 stolon-keeper[1383]: server stopped
Sep 20 09:30:51 vg1 stolon-keeper[1383]: 2020-09-20T09:30:51.774+0300        INFO        cmd/keeper.go:1484        our db requested role is master
Sep 20 09:30:51 vg1 stolon-keeper[1383]: 2020-09-20T09:30:51.785+0300        INFO        postgresql/postgresql.go:858        writing recovery signal file
Sep 20 09:30:51 vg1 stolon-keeper[1383]: 2020-09-20T09:30:51.785+0300        INFO        postgresql/postgresql.go:319        starting database

Look at writing recovery signal file line - keeper shouldn't have written recovery signal file after postgres restart

What you expected to happen: I've expected postgresql to recover and start accepting queries.

Environment: