zalando / spilo

Highly available elephant herd: HA PostgreSQL cluster using Docker
Apache License 2.0
1.53k stars 382 forks source link

postgres operator major upgrade failed on ipv6 #874

Open olivejing opened 1 year ago

olivejing commented 1 year ago

This issue is opened to zalando postgres opetaor before: https://github.com/zalando/postgres-operator/issues/2258

I installed postgres operator and cluster v1.8.2 on pure ipv6 Openstack lab, and then upgraded to v1.9.0. but major upgrade failed. That upgrade is successful on ipv4 lab. Major upgrade log:

$ python3 /scripts/inplace_upgrade.py 3
2023-03-10 02:26:51,811 inplace_upgrade INFO: No PostgreSQL configuration items changed, nothing to reload.
2023-03-10 02:26:51,843 inplace_upgrade INFO: establishing a new patroni connection to the postgres cluster
2023-03-10 02:26:51,940 inplace_upgrade INFO: Cluster char-pg-cluster is ready to be upgraded
2023-03-10 02:26:51,941 inplace_upgrade INFO: initdb config: [{'locale': 'en_US.UTF-8'}, {'encoding': 'UTF8'}, 'data-checksums']
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default text search configuration will be set to "english".

Data page checksums are enabled.

creating directory /home/postgres/pgdata/pgroot/data_new ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections
initdb: hint: You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    /usr/lib/postgresql/15/bin/pg_ctl -D /home/postgres/pgdata/pgroot/data_new -l logfile start

2023-03-10 02:26:52,826 inplace_upgrade INFO: Dropping extensions from the cluster which could be incompatible
2023-03-10 02:26:52,830 inplace_upgrade INFO: Executing "DROP EXTENSION IF EXISTS amcheck_next" in the database="postgres"
2023-03-10 02:26:52,830 inplace_upgrade INFO: Executing "DROP EXTENSION IF EXISTS pg_repack" in the database="postgres"
2023-03-10 02:26:52,834 inplace_upgrade INFO: Executing "DROP EXTENSION IF EXISTS amcheck_next" in the database="registry"
2023-03-10 02:26:52,834 inplace_upgrade INFO: Executing "DROP EXTENSION IF EXISTS pg_repack" in the database="registry"
2023-03-10 02:26:52,837 inplace_upgrade INFO: Executing "DROP EXTENSION IF EXISTS amcheck_next" in the database="template1"
2023-03-10 02:26:52,838 inplace_upgrade INFO: Executing "DROP EXTENSION IF EXISTS pg_repack" in the database="template1"
2023-03-10 02:26:52,838 inplace_upgrade INFO: Executing pg_upgrade --check
Performing Consistency Checks on Old Live Server
------------------------------------------------
Checking cluster versions                                   ok
Checking database user is the install user                  ok
Checking database connection settings                       ok
Checking for prepared transactions                          ok
Checking for system-defined composite types in user tables  ok
Checking for reg* data types in user tables                 ok
Checking for contrib/isn with bigint-passing mismatch       ok
Checking for presence of required libraries                 ok
Checking database user is the install user                  ok
Checking for prepared transactions                          ok
Checking for new cluster tablespace directories             ok

*Clusters are compatible*
2023-03-10 02:26:53,536 inplace_upgrade INFO: Dropping objects from the cluster which could be incompatible
2023-03-10 02:26:53,540 inplace_upgrade INFO: Executing "REVOKE EXECUTE ON FUNCTION pg_catalog.pg_switch_wal() FROM admin" in the database="postgres"
2023-03-10 02:26:53,544 inplace_upgrade INFO: Executing "DROP FUNCTION metric_helpers.pg_stat_statements" in the database="postgres"
2023-03-10 02:26:53,548 inplace_upgrade INFO: Executing "DROP EXTENSION IF EXISTS pg_stat_kcache" in the database="postgres"
2023-03-10 02:26:53,582 inplace_upgrade INFO: Executing "DROP EXTENSION IF EXISTS pg_stat_statements" in the database="postgres"
2023-03-10 02:26:53,585 inplace_upgrade INFO: Executing "DROP EXTENSION IF EXISTS amcheck_next" in the database="postgres"
2023-03-10 02:26:53,585 inplace_upgrade INFO: Executing "DROP EXTENSION IF EXISTS pg_repack" in the database="postgres"
2023-03-10 02:26:53,590 inplace_upgrade INFO: Executing "REVOKE EXECUTE ON FUNCTION pg_catalog.pg_switch_wal() FROM admin" in the database="registry"
2023-03-10 02:26:53,593 inplace_upgrade INFO: Executing "DROP FUNCTION metric_helpers.pg_stat_statements" in the database="registry"
2023-03-10 02:26:53,649 inplace_upgrade INFO: Executing "DROP EXTENSION IF EXISTS pg_stat_kcache" in the database="registry"
2023-03-10 02:26:53,661 inplace_upgrade INFO: Executing "DROP EXTENSION IF EXISTS pg_stat_statements" in the database="registry"
2023-03-10 02:26:53,663 inplace_upgrade INFO: Executing "DROP EXTENSION IF EXISTS amcheck_next" in the database="registry"
2023-03-10 02:26:53,664 inplace_upgrade INFO: Executing "DROP EXTENSION IF EXISTS pg_repack" in the database="registry"
2023-03-10 02:26:53,668 inplace_upgrade INFO: Executing "REVOKE EXECUTE ON FUNCTION pg_catalog.pg_switch_wal() FROM admin" in the database="template1"
2023-03-10 02:26:53,671 inplace_upgrade INFO: Executing "DROP FUNCTION metric_helpers.pg_stat_statements" in the database="template1"
2023-03-10 02:26:53,706 inplace_upgrade INFO: Executing "DROP EXTENSION IF EXISTS pg_stat_kcache" in the database="template1"
2023-03-10 02:26:53,726 inplace_upgrade INFO: Executing "DROP EXTENSION IF EXISTS pg_stat_statements" in the database="template1"
2023-03-10 02:26:53,729 inplace_upgrade INFO: Executing "DROP EXTENSION IF EXISTS amcheck_next" in the database="template1"
2023-03-10 02:26:53,729 inplace_upgrade INFO: Executing "DROP EXTENSION IF EXISTS pg_repack" in the database="template1"
2023-03-10 02:26:53,730 inplace_upgrade INFO: Enabling maintenance mode
2023-03-10 02:26:54,749 inplace_upgrade INFO: Maintenance mode enabled
2023-03-10 02:26:54,749 inplace_upgrade INFO: Doing a clean shutdown of the cluster before pg_upgrade
2023-03-10 02:26:55,337 inplace_upgrade INFO: Latest checkpoint location: 184549416
2023-03-10 02:26:55,337 inplace_upgrade INFO: Starting rsyncd
2023-03-10 02:26:55,338 inplace_upgrade INFO: Waiting for replica nodes to catch up with primary
2023-03-10 02:26:55,342 inplace_upgrade INFO: All replicas are ready
2023-03-10 02:26:55,342 inplace_upgrade INFO: Executing CHECKPOINT on replicas char-pg-cluster-1,char-pg-cluster-2
2023-03-10 02:26:55,348 inplace_upgrade INFO: Executing pg_upgrade
Performing Consistency Checks
-----------------------------
Checking cluster versions                                   ok
Checking database user is the install user                  ok
Checking database connection settings                       ok
Checking for prepared transactions                          ok
Checking for system-defined composite types in user tables  ok
Checking for reg* data types in user tables                 ok
Checking for contrib/isn with bigint-passing mismatch       ok
Creating dump of global objects                             ok
Creating dump of database schemas
                                                            ok
Checking for presence of required libraries                 ok
Checking database user is the install user                  ok
Checking for prepared transactions                          ok
Checking for new cluster tablespace directories             ok

If pg_upgrade fails after this point, you must re-initdb the
new cluster before continuing.

Performing Upgrade
------------------
Analyzing all rows in the new cluster                       ok
Freezing all rows in the new cluster                        ok
Deleting files from new pg_xact                             ok
Copying old pg_xact to new server                           ok
Setting oldest XID for new cluster                          ok
Setting next transaction ID and epoch for new cluster       ok
Deleting files from new pg_multixact/offsets                ok
Copying old pg_multixact/offsets to new server              ok
Deleting files from new pg_multixact/members                ok
Copying old pg_multixact/members to new server              ok
Setting next multixact ID and offset for new cluster        ok
Resetting WAL archives                                      ok
Setting frozenxid and minmxid counters in new cluster       ok
Restoring global objects in the new cluster                 ok
Restoring database schemas in the new cluster
                                                            ok
Adding ".old" suffix to old global/pg_control               ok

If you want to start the old cluster, you will need to remove
the ".old" suffix from /home/postgres/pgdata/pgroot/data/global/pg_control.old.
Because "link" mode was used, the old cluster cannot be safely
started once the new cluster has been started.

Linking user relation files
                                                            ok
Setting next OID for new cluster                            ok
Sync data directory to disk                                 ok
Creating script to delete old cluster                       ok
Checking for extension updates                              ok

Upgrade Complete
----------------
Optimizer statistics are not transferred by pg_upgrade.
Once you start the new server, consider running:
    /usr/lib/postgresql/15/bin/vacuumdb -U postgres --all --analyze-in-stages

Running this script will delete the old cluster's data files:
    ./delete_old_cluster.sh
2023-03-10 02:26:58,583 inplace_upgrade INFO: Updating configuration files
2023-03-10 02:26:58,603 inplace_upgrade INFO: Writing to file /run/postgres.yml
2023-03-10 02:26:58,604 inplace_upgrade INFO: Notifying replicas char-pg-cluster-1,char-pg-cluster-2 to start rsync
2023-03-10 02:26:59,464 inplace_upgrade INFO: Waiting for replicas rsync to complete
2023-03-10 02:31:59,834 inplace_upgrade ERROR: Did not received rsync feedback from char-pg-cluster-1 after 300 seconds
2023-03-10 02:31:59,834 inplace_upgrade ERROR: Did not received rsync feedback from char-pg-cluster-2 after 300 seconds
2023-03-10 02:31:59,835 inplace_upgrade INFO: Rsync took 301.23084330558777 seconds
2023-03-10 02:31:59,835 inplace_upgrade INFO: Stopping rsyncd
2023-03-10 02:32:01,840 inplace_upgrade INFO: Removing initialize key
2023-03-10 02:32:01,902 inplace_upgrade INFO: Restarting patroni
2023-03-10 02:32:02,907 inplace_upgrade INFO: Starting the primary postgres up
2023-03-10 02:32:04,165 inplace_upgrade INFO:    200 restarted successfully
2023-03-10 02:32:04,165 inplace_upgrade INFO: Upgrade downtime: 309.4158442020416
2023-03-10 02:32:04,617 inplace_upgrade INFO: establishing a new patroni connection to the postgres cluster
2023-03-10 02:32:04,630 inplace_upgrade INFO: Executing 'ALTER EXTENSION plpgsql UPDATE' in the database=template1
2023-03-10 02:32:04,631 inplace_upgrade INFO: Executing 'ALTER EXTENSION set_user UPDATE' in the database=template1
2023-03-10 02:32:04,640 inplace_upgrade INFO: Executing 'ALTER EXTENSION plpgsql UPDATE' in the database=registry
2023-03-10 02:32:04,640 inplace_upgrade INFO: Executing 'ALTER EXTENSION set_user UPDATE' in the database=registry
2023-03-10 02:32:04,645 inplace_upgrade INFO: Executing 'ALTER EXTENSION plpgsql UPDATE' in the database=postgres
2023-03-10 02:32:04,646 inplace_upgrade INFO: Executing 'ALTER EXTENSION plpython3u UPDATE' in the database=postgres
2023-03-10 02:32:04,646 inplace_upgrade INFO: Executing 'ALTER EXTENSION file_fdw UPDATE' in the database=postgres
2023-03-10 02:32:04,646 inplace_upgrade INFO: Executing 'ALTER EXTENSION pg_auth_mon UPDATE' in the database=postgres
2023-03-10 02:32:04,646 inplace_upgrade INFO: Executing 'ALTER EXTENSION set_user UPDATE' in the database=postgres
2023-03-10 02:32:04,647 inplace_upgrade INFO: Resetting non-default statistics target before analyze
2023-03-10 02:32:04,647 inplace_upgrade INFO: Waiting for restart of patroni on replicas char-pg-cluster-1, char-pg-cluster-2
2023-03-10 02:32:04,668 inplace_upgrade INFO: Rebuilding statistics (vacuumdb --analyze-in-stages)
vacuumdb: processing database "template1": Generating minimal optimizer statistics (1 target)
vacuumdb: processing database "registry": Generating minimal optimizer statistics (1 target)
vacuumdb: processing database "postgres": Generating minimal optimizer statistics (1 target)
2023-03-10 02:32:14,692 inplace_upgrade ERROR: Patroni on replica char-pg-cluster-1 was not restarted in 10 seconds
2023-03-10 02:32:14,692 inplace_upgrade ERROR: Patroni on replica char-pg-cluster-2 was not restarted in 10 seconds
2023-03-10 02:32:14,693 inplace_upgrade INFO:   [None, None] successfully restarted
2023-03-10 02:32:14,693 inplace_upgrade INFO: Disabling maintenance mode
2023-03-10 02:32:15,706 inplace_upgrade INFO: Maintenance mode disabled