zalando / spilo

Highly available elephant herd: HA PostgreSQL cluster using Docker
Apache License 2.0
1.53k stars 382 forks source link

Failed to shutdown cluster during PG15 Upgrade. #883

Closed PGPAWAN closed 1 year ago

PGPAWAN commented 1 year ago

We have identified below issue while doing Spilo upgrade to upgrade PostgreSQL 14 to PostgreSQL 15.

During the upgrade patroni cluster failed to shutdown with below error. upgrade command on pod it is getting stuck on doing clean shutdown phase.

Request you please check and help me fix it.

Note: we are running only 1 Pod as master in DEV env..

Upgrade log

postgres@pg-testdb-0:~$ python3 /scripts/inplace_upgrade.py 1
2023-05-17 10:53:46,707 inplace_upgrade INFO: No PostgreSQL configuration items changed, nothing to reload.
2023-05-17 10:53:46,734 inplace_upgrade INFO: establishing a new patroni connection to the postgres cluster
2023-05-17 10:53:46,761 inplace_upgrade INFO: Cluster pg-testdb is ready to be upgraded
2023-05-17 10:53:46,762 inplace_upgrade INFO: initdb config: [{'locale': 'en_US.UTF-8'}, {'encoding': 'UTF8'}, 'data-checksums']
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default text search configuration will be set to "english".

Data page checksums are enabled.

creating directory /home/postgres/pgdata/pgroot/data_new ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections
initdb: hint: You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    /usr/lib/postgresql/15/bin/pg_ctl -D /home/postgres/pgdata/pgroot/data_new -l logfile start

2023-05-17 10:53:47,698 inplace_upgrade INFO: Dropping extensions from the cluster which could be incompatible
2023-05-17 10:53:47,703 inplace_upgrade INFO: Executing "DROP EXTENSION IF EXISTS amcheck_next" in the database="postgres"
2023-05-17 10:53:47,704 inplace_upgrade INFO: Executing "DROP EXTENSION IF EXISTS pg_repack" in the database="postgres"
2023-05-17 10:53:47,708 inplace_upgrade INFO: Executing "DROP EXTENSION IF EXISTS amcheck_next" in the database="testdb"
2023-05-17 10:53:47,709 inplace_upgrade INFO: Executing "DROP EXTENSION IF EXISTS pg_repack" in the database="testdb"
2023-05-17 10:53:47,720 inplace_upgrade INFO: Executing "DROP EXTENSION IF EXISTS amcheck_next" in the database="template1"
2023-05-17 10:53:47,720 inplace_upgrade INFO: Executing "DROP EXTENSION IF EXISTS pg_repack" in the database="template1"
2023-05-17 10:53:47,721 inplace_upgrade INFO: Executing pg_upgrade --check
Performing Consistency Checks on Old Live Server
------------------------------------------------
Checking cluster versions                                   ok
Checking database user is the install user                  ok
Checking database connection settings                       ok
Checking for prepared transactions                          ok
Checking for system-defined composite types in user tables  ok
Checking for reg* data types in user tables                 ok
Checking for contrib/isn with bigint-passing mismatch       ok
Checking for presence of required libraries                 ok
Checking database user is the install user                  ok
Checking for prepared transactions                          ok
Checking for new cluster tablespace directories             ok

*Clusters are compatible*
2023-05-17 10:53:48,365 inplace_upgrade INFO: Dropping objects from the cluster which could be incompatible
2023-05-17 10:53:48,369 inplace_upgrade INFO: Executing "REVOKE EXECUTE ON FUNCTION pg_catalog.pg_switch_wal() FROM admin" in the database="postgres"
2023-05-17 10:53:48,371 inplace_upgrade INFO: Executing "DROP FUNCTION metric_helpers.pg_stat_statements" in the database="postgres"
2023-05-17 10:53:48,375 inplace_upgrade INFO: Executing "DROP EXTENSION IF EXISTS pg_stat_kcache" in the database="postgres"
2023-05-17 10:53:48,379 inplace_upgrade INFO: Executing "DROP EXTENSION IF EXISTS pg_stat_statements" in the database="postgres"
2023-05-17 10:53:48,381 inplace_upgrade INFO: Executing "DROP EXTENSION IF EXISTS amcheck_next" in the database="postgres"
2023-05-17 10:53:48,381 inplace_upgrade INFO: Executing "DROP EXTENSION IF EXISTS pg_repack" in the database="postgres"
2023-05-17 10:53:48,385 inplace_upgrade INFO: Executing "REVOKE EXECUTE ON FUNCTION pg_catalog.pg_switch_wal() FROM admin" in the database="testdb"
2023-05-17 10:53:48,387 inplace_upgrade INFO: Executing "DROP FUNCTION metric_helpers.pg_stat_statements" in the database="testdb"
2023-05-17 10:53:48,391 inplace_upgrade INFO: Executing "DROP EXTENSION IF EXISTS pg_stat_kcache" in the database="testdb"
2023-05-17 10:53:48,395 inplace_upgrade INFO: Executing "DROP EXTENSION IF EXISTS pg_stat_statements" in the database="testdb"
2023-05-17 10:53:48,398 inplace_upgrade INFO: Executing "DROP EXTENSION IF EXISTS amcheck_next" in the database="testdb"
2023-05-17 10:53:48,399 inplace_upgrade INFO: Executing "DROP EXTENSION IF EXISTS pg_repack" in the database="testdb"
2023-05-17 10:53:48,404 inplace_upgrade INFO: Executing "REVOKE EXECUTE ON FUNCTION pg_catalog.pg_switch_wal() FROM admin" in the database="template1"
2023-05-17 10:53:48,405 inplace_upgrade INFO: Executing "DROP FUNCTION metric_helpers.pg_stat_statements" in the database="template1"
2023-05-17 10:53:48,410 inplace_upgrade INFO: Executing "DROP EXTENSION IF EXISTS pg_stat_kcache" in the database="template1"
2023-05-17 10:53:48,413 inplace_upgrade INFO: Executing "DROP EXTENSION IF EXISTS pg_stat_statements" in the database="template1"
2023-05-17 10:53:48,415 inplace_upgrade INFO: Executing "DROP EXTENSION IF EXISTS amcheck_next" in the database="template1"
2023-05-17 10:53:48,415 inplace_upgrade INFO: Executing "DROP EXTENSION IF EXISTS pg_repack" in the database="template1"
2023-05-17 10:53:48,416 inplace_upgrade INFO: Enabling maintenance mode
2023-05-17 10:53:49,430 inplace_upgrade INFO: Maintenance mode enabled
2023-05-17 10:53:49,430 inplace_upgrade INFO: Doing a clean shutdown of the cluster before pg_upgrade
^C2023-05-17 10:54:47,118 inplace_upgrade INFO: Disabling maintenance mode

Patroni log

2023-05-17 10:53:39.137 UTC [37] LOG {ticks: 0, maint: 0, retry: 0}
2023-05-17 10:53:43,267 INFO: no action. I am (pg-testdb-0), the leader with the lock
2023-05-17 10:53:48,446 INFO: PAUSE: no action. I am (pg-testdb-0), the leader with the lock
2023-05-17 10:53:48,452 INFO: Changed tcp_keepalives_idle from 0 to 900
2023-05-17 10:53:48,452 INFO: Changed tcp_keepalives_interval from 0 to 100
2023-05-17 10:53:48,454 INFO: Reloading PostgreSQL configuration.
server signaled
2023-05-17 10:53:49,590 INFO: establishing a new patroni connection to the postgres cluster
2023-05-17 10:53:50,413 INFO: establishing a new patroni connection to the postgres cluster
2023-05-17 10:53:50,996 INFO: establishing a new patroni connection to the postgres cluster
2023-05-17 10:53:51,939 INFO: establishing a new patroni connection to the postgres cluster
2023-05-17 10:53:53,283 INFO: establishing a new patroni connection to the postgres cluster
2023-05-17 10:53:53,286 WARNING: Retry got exception: 'connection problems'
2023-05-17 10:53:53,286 WARNING: Exception PostgresConnectionException("'Exceeded retry deadline'") when running query
2023-05-17 10:53:58,929 INFO: Lock owner: pg-testdb-0; I am pg-testdb-0
2023-05-17 10:53:58,929 INFO: establishing a new patroni connection to the postgres cluster
2023-05-17 10:53:59,072 INFO: establishing a new patroni connection to the postgres cluster
2023-05-17 10:53:59,075 WARNING: Retry got exception: 'connection problems'
/var/run/postgresql:5432 - rejecting connections
2023-05-17 10:53:59,081 WARNING: Failed to determine PostgreSQL state from the connection, falling back to cached role
2023-05-17 10:53:59,082 WARNING: Failed to determine PostgreSQL state from the connection, falling back to cached role
2023-05-17 10:53:59,082 ERROR: Exception when called state_handler.last_operation()
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/patroni/ha.py", line 244, in update_lock
    last_lsn = self.state_handler.last_operation()
  File "/usr/local/lib/python3.10/dist-packages/patroni/postgresql/__init__.py", line 1053, in last_operation
    return self._wal_position(self.is_leader(), self._cluster_info_state_get('wal_position'),
  File "/usr/local/lib/python3.10/dist-packages/patroni/postgresql/__init__.py", line 389, in _cluster_info_state_get
    raise PostgresConnectionException(self._cluster_info_state['error'])
patroni.exceptions.PostgresConnectionException: "'Too many retry attempts'"
2023-05-17 10:53:59,129 WARNING: Failed to determine PostgreSQL state from the connection, falling back to cached role
2023-05-17 10:53:59,144 INFO: PAUSE: Error communicating with PostgreSQL. Will try again later
/var/run/postgresql:5432 - rejecting connections
2023-05-17 10:54:08,936 INFO: Lock owner: pg-testdb-0; I am pg-testdb-0
2023-05-17 10:54:08,936 INFO: establishing a new patroni connection to the postgres cluster
2023-05-17 10:54:09,048 INFO: establishing a new patroni connection to the postgres cluster
2023-05-17 10:54:09,051 WARNING: Retry got exception: 'connection problems'
/var/run/postgresql:5432 - rejecting connections
2023-05-17 10:54:09,058 WARNING: Failed to determine PostgreSQL state from the connection, falling back to cached role
2023-05-17 10:54:09,059 WARNING: Failed to determine PostgreSQL state from the connection, falling back to cached role

Operator log

time="2023-05-17T10:57:09Z" level=info msg="UPDATE event has been queued" cluster-name=pg-testdb/pg-testdb pkg=controller worker=0
time="2023-05-17T10:57:09Z" level=info msg="update of the cluster started" cluster-name=pg-testdb/pg-testdb pkg=controller worker=0
time="2023-05-17T10:57:09Z" level=debug msg="-    resourceVersion: 4047," cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="-    generation: 2," cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="+    resourceVersion: 4869," cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="+    generation: 3," cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="-      kubectl.kubernetes.io/last-applied-configuration: {\\apiVersion\\:\\acid.zalan.do/v1\\,\\kind\\:\\postgresql\\,\\metadata\\:{\\annotations\\:{},\\name\\:\\pg-testdb\\,\\namespace\\:\\pg-testdb\\},\\spec\\:{\\databases\\:{\\testdb\\:\\testdb\\},\\dockerImage\\:\\registry.opensource.zalan.do/acid/spilo-14:2.1-p6\\,\\numberOfInstances\\:1,\\patroni\\:{\\initdb\\:{\\data-checksums\\:\\true\\,\\encoding\\:\\UTF8\\,\\locale\\:\\en_US.UTF-8\\},\\pg_hba\\:[\\hostssl replication     standby     all             md5\\,\\hostssl all             postgres    localhost       md5\\,\\hostssl all             postgres    all             md5\\,\\local   all             postgres                    peer\\,\\local   all             all                         md5\\,\\local   replication     standby                     md5\\,\\hostssl all             all         all             md5 clientcert=verify-ca\\]},\\postgresql\\:{\\parameters\\:{\\effective_cache_size\\:\\768MB\\,\\log_rotation_size\\:\\100000\\,\\maintenance_work_mem\\:\\64MB\\,\\max_connections\\:\\400\\,\\shared_buffers\\:\\256MB\\,\\shared_preload_libraries\\:\\bg_mon,pg_stat_statements,pgextwlist,pg_auth_mon,set_user,pg_cron,pg_stat_kcache,pgaudit\\,\\work_mem\\:\\1MB\\},\\version\\:\\15\\},\\resources\\:{\\limits\\:{\\cpu\\:\\1000m\\,\\memory\\:\\900Mi\\},\\requests\\:{\\cpu\\:\\600m\\,\\memory\\:\\100Mi\\}},\\teamId\\:\\pg\\,\\users\\:{\\testdb\\:[\\createdb\\]},\\volume\\:{\\size\\:\\8Gi\\}}}\\n" cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="+      kubectl.kubernetes.io/last-applied-configuration: {\\apiVersion\\:\\acid.zalan.do/v1\\,\\kind\\:\\postgresql\\,\\metadata\\:{\\annotations\\:{},\\name\\:\\pg-testdb\\,\\namespace\\:\\pg-testdb\\},\\spec\\:{\\databases\\:{\\testdb\\:\\testdb\\},\\dockerImage\\:\\ghcr.io/zalando/spilo-15:2.1-p9\\,\\numberOfInstances\\:1,\\patroni\\:{\\initdb\\:{\\data-checksums\\:\\true\\,\\encoding\\:\\UTF8\\,\\locale\\:\\en_US.UTF-8\\},\\pg_hba\\:[\\hostssl replication     standby     all             md5\\,\\hostssl all             postgres    localhost       md5\\,\\hostssl all             postgres    all             md5\\,\\local   all             postgres                    peer\\,\\local   all             all                         md5\\,\\local   replication     standby                     md5\\,\\hostssl all             all         all             md5 clientcert=verify-ca\\]},\\postgresql\\:{\\parameters\\:{\\effective_cache_size\\:\\768MB\\,\\log_rotation_size\\:\\100000\\,\\maintenance_work_mem\\:\\64MB\\,\\max_connections\\:\\400\\,\\shared_buffers\\:\\256MB\\,\\shared_preload_libraries\\:\\bg_mon,pg_stat_statements,pgextwlist,pg_auth_mon,set_user,pg_cron,pg_stat_kcache,pgaudit\\,\\work_mem\\:\\1MB\\},\\version\\:\\15\\},\\resources\\:{\\limits\\:{\\cpu\\:\\1000m\\,\\memory\\:\\900Mi\\},\\requests\\:{\\cpu\\:\\600m\\,\\memory\\:\\100Mi\\}},\\teamId\\:\\pg\\,\\users\\:{\\testdb\\:[\\createdb\\]},\\volume\\:{\\size\\:\\8Gi\\}}}\\n" cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="-    dockerImage: registry.opensource.zalan.do/acid/spilo-14:2.1-p6," cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="+    dockerImage: ghcr.io/zalando/spilo-15:2.1-p9," cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=info msg="postgresql major version unchanged or smaller, no changes needed" cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="final load balancer source ranges as seen in a service spec (not necessarily applied): [\"127.0.0.1/32\"]" cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="final load balancer source ranges as seen in a service spec (not necessarily applied): [\"127.0.0.1/32\"]" cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="final load balancer source ranges as seen in a service spec (not necessarily applied): [\"127.0.0.1/32\"]" cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="final load balancer source ranges as seen in a service spec (not necessarily applied): [\"127.0.0.1/32\"]" cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="syncing volumes using \"pvc\" storage resize mode" cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=info msg="volume claims do not require changes" cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="syncing statefulsets" cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="mark rolling update annotation for pg-testdb-0: reason pod changes" cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=info msg="statefulset pg-testdb/pg-testdb is not in the desired state and needs to be updated" cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="-          image: registry.opensource.zalan.do/acid/spilo-14:2.1-p6," cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="+          image: ghcr.io/zalando/spilo-15:2.1-p9," cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="-          terminationMessagePath: /dev/termination-log," cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="-          terminationMessagePolicy: File," cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="-          ]," cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="-          terminationMessagePath: /dev/termination-log," cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="-          terminationMessagePolicy: File," cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="-          imagePullPolicy: Always" cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="+          ]" cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="-      restartPolicy: Always," cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="-      dnsPolicy: ClusterFirst," cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="-      serviceAccount: postgres-pod," cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="-      }," cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="-      schedulerName: default-scheduler" cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="+      }" cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="-      kind: PersistentVolumeClaim," cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="-      apiVersion: v1," cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="-      status: {" cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="-        phase: Pending" cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="-      }" cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="+      status: {}" cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="-  }," cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="-  revisionHistoryLimit: 10" cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="+  }" cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=info msg="reason: new statefulset containers's postgres (index 0) image does not match the current one" cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="updating statefulset" cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="making GET http request: http://172.17.0.4:8008/config" cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:09Z" level=debug msg="making GET http request: http://172.17.0.4:8008/patroni" cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:11Z" level=debug msg="performing rolling update" cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:11Z" level=info msg="there are 1 pods in the cluster to recreate" cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:11Z" level=warning msg="cannot perform switch over before re-creating the pod: no replicas" cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:11Z" level=info msg="recreating old master pod \"pg-testdb/pg-testdb-0\"" cluster-name=pg-testdb/pg-testdb pkg=cluster
time="2023-05-17T10:57:11Z" level=debug msg="subscribing to pod \"pg-testdb/pg-testdb-0\"" cluster-name=pg-testdb/pg-testdb pkg=cluster

Images Used

registry.opensource.zalan.do/acid/postgres-operator:v1.9.0 registry.opensource.zalan.do/acid/spilo-14:2.1-p6 ghcr.io/zalando/spilo-15:2.1-p9

Where do you run it - cloud or metal? Kubernetes or OpenShift?

VMware Virtual Machine

Kubernetes version

v.21.

PGPAWAN commented 1 year ago

I have followed all the steps manually to test the upgrade and current running postgres service failed to stop.

Checked cluster compatible

postgres@pg-testdb-0:/tmp$ time /usr/lib/postgresql/15/bin/pg_upgrade --old-bindir /usr/lib/postgresql/14/bin --new-bindir /usr/lib/postgresql/15/bin -p 5432 -P 5433 --old-datadir /home/postgres/pgdata/pgroot/data --new-datadir /home/postgres/pgdata/pgroot/data1 --link --check
Performing Consistency Checks on Old Live Server
------------------------------------------------
Checking cluster versions                                   ok
Checking database user is the install user                  ok
Checking database connection settings                       ok
Checking for prepared transactions                          ok
Checking for system-defined composite types in user tables  ok
Checking for reg* data types in user tables                 ok
Checking for contrib/isn with bigint-passing mismatch       ok
Checking for presence of required libraries                 ok
Checking database user is the install user                  ok
Checking for prepared transactions                          ok
Checking for new cluster tablespace directories             ok

*Clusters are compatible*

real    0m0.609s
user    0m0.060s
sys 0m0.067s
postgres@pg-testdb-0:/tmp$

Failed to stop postgresql service.

postgres@pg-testdb-0:/tmp$ patronictl list
+ Cluster: pg-testdb ------+--------+---------+----+-----------+
| Member          | Host       | Role   | State   | TL | Lag in MB |
+-----------------+------------+--------+---------+----+-----------+
| pg-testdb-0 | 172.17.0.5 | Leader | running |  7 |           |
+-----------------+------------+--------+---------+----+-----------+
 Maintenance mode: on
postgres@pg-testdb-0:/tmp$ pg_ctl -D /home/postgres/pgdata/pgroot/data stop
waiting for server to shut down............................................................... failed
pg_ctl: server does not shut down

postgres log


2023-05-18 07:04:53.427 UTC,"postgres","postgres",415,"[local]",6465ce15.19f,4,"idle",2023-05-18 07:04:53 UTC,,0,LOG,00000,"disconnection: session time: 0:00:00.002 user=postgres database=postgres host=[local]",,,,,,,,,"pgq ticker","client backend",,0
2023-05-18 07:04:53.429 UTC,"postgres","testdb",416,"[local]",6465ce15.1a0,4,"idle",2023-05-18 07:04:53 UTC,,0,LOG,00000,"disconnection: session time: 0:00:00.003 user=postgres database=testdb host=[local]",,,,,,,,,"pgq ticker","client backend",,0
2023-05-18 07:04:57.418 UTC,,,63,,6465ccea.3f,10,,2023-05-18 06:59:54 UTC,,0,LOG,00000,"received SIGHUP, reloading configuration files",,,,,,,,,"","postmaster",,0
2023-05-18 07:04:58.423 UTC,,,119,,6465cced.77,3,,2023-05-18 06:59:57 UTC,,0,LOG,00000,"checkpoint starting: time",,,,,,,,,"","checkpointer",,0
2023-05-18 07:05:01.163 UTC,,,63,,6465ccea.3f,11,,2023-05-18 06:59:54 UTC,,0,LOG,00000,"received fast shutdown request",,,,,,,,,"","postmaster",,0
2023-05-18 07:05:01.166 UTC,,,63,,6465ccea.3f,12,,2023-05-18 06:59:54 UTC,,0,LOG,00000,"aborting any active transactions",,,,,,,,,"","postmaster",,0
2023-05-18 07:05:01.166 UTC,,,210,,6465ccee.d2,2,,2023-05-18 06:59:58 UTC,4/0,0,LOG,00000,"pg_cron scheduler shutting down",,,,,,,,,"","pg_cron launcher",,-5280970146734079043
2023-05-18 07:05:01.169 UTC,"postgres","postgres",138,"[local]",6465cced.8a,4,"idle",2023-05-18 06:59:57 UTC,2/0,0,FATAL,57P01,"terminating connection due to administrator command",,,,,,,,,"Patroni","client backend",,-1474624185022020383
2023-05-18 07:05:01.169 UTC,"postgres","postgres",138,"[local]",6465cced.8a,5,"idle",2023-05-18 06:59:57 UTC,,0,LOG,00000,"disconnection: session time: 0:05:04.006 user=postgres database=postgres host=[local]",,,,,,,,,"Patroni","client backend",,0
2023-05-18 07:05:01.173 UTC,,,63,,6465ccea.3f,13,,2023-05-18 06:59:54 UTC,,0,LOG,00000,"background worker ""logical replication launcher"" (PID 212) exited with exit code 1",,,,,,,,,"","postmaster",,0
2023-05-18 07:05:01.180 UTC,,,63,,6465ccea.3f,14,,2023-05-18 06:59:54 UTC,,0,LOG,00000,"background worker ""bg_mon"" (PID 67) exited with exit code 1",,,,,,,,,"","postmaster",,0
2023-05-18 07:05:01.238 UTC,,,119,,6465cced.77,4,,2023-05-18 06:59:57 UTC,,0,LOG,00000,"checkpoint complete: wrote 491 buffers (1.5%); 0 WAL file(s) added, 0 removed, 0 recycled; write=2.774 s, sync=0.037 s, total=2.816 s; sync files=165, longest=0.014 s, average=0.001 s; distance=1879 kB, estimate=1879 kB",,,,,,,,,"","checkpointer",,0
2023-05-18 07:05:01.238 UTC,,,119,,6465cced.77,5,,2023-05-18 06:59:57 UTC,,0,LOG,00000,"shutting down",,,,,,,,,"","checkpointer",,0
2023-05-18 07:05:01.255 UTC,,,119,,6465cced.77,6,,2023-05-18 06:59:57 UTC,,0,LOG,00000,"checkpoint starting: shutdown immediate",,,,,,,,,"","checkpointer",,0
2023-05-18 07:05:01.285 UTC,,,119,,6465cced.77,7,,2023-05-18 06:59:57 UTC,,0,LOG,00000,"checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0
.001 s, total=0.031 s; sync files=0, longest=0.000 s, average=0.000 s; distance=14504 kB, estimate=14504 kB",,,,,,,,,"","checkpointer",,0
2023-05-18 07:05:08.547 UTC,,,426,"[local]",6465ce24.1aa,1,"",2023-05-18 07:05:08 UTC,,0,LOG,00000,"connection received: host=[local]",,,,,,,,,"","not initialized",,0
2023-05-18 07:05:08.547 UTC,"postgres","postgres",426,"[local]",6465ce24.1aa,2,"",2023-05-18 07:05:08 UTC,,0,FATAL,57P03,"the database system is shutting down",,,,,,,,,"","client backend",,0
2023-05-18 07:05:08.557 UTC,,,428,"[local]",6465ce24.1ac,1,"",2023-05-18 07:05:08 UTC,,0,LOG,00000,"connection received: host=[local]",,,,,,,,,"","not initialized",,0
2023-05-18 07:05:08.557 UTC,"postgres","postgres",428,"[local]",6465ce24.1ac,2,"",2023-05-18 07:05:08 UTC,,0,FATAL,57P03,"the database system is shutting down",,,,,,,,,"","client backend",,0
2023-05-18 07:05:17.857 UTC,,,438,"[local]",6465ce2d.1b6,1,"",2023-05-18 07:05:17 UTC,,0,LOG,00000,"connection received: host=[local]",,,,,,,,,"","not initialized",,0
2023-05-18 07:05:17.857 UTC,"postgres","postgres",438,"[local]",6465ce2d.1b6,2,"",2023-05-18 07:05:17 UTC,,0,FATAL,57P03,"the database system is shutting down",,,,,,,,,"","client backend",,0
2023-05-18 07:05:17.865 UTC,,,439,"[local]",6465ce2d.1b7,1,"",2023-05-18 07:05:17 UTC,,0,LOG,00000,"connection received: host=[local]",,,,,,,,,"","not initialized",,0
2023-05-18 07:05:17.866 UTC,"postgres","postgres",439,"[local]",6465ce2d.1b7,2,"",2023-05-18 07:05:17 UTC,,0,FATAL,57P03,"the database system is shutting down",,,,,,,,,"","client backend",,0
2023-05-18 07:05:18.630 UTC,,,440,"[local]",6465ce2e.1b8,1,"",2023-05-18 07:05:18 UTC,,0,LOG,00000,"connection received: host=[local]",,,,,,,,,"","not initialized",,0
hughcapet commented 1 year ago

There are two things that possibly prevents the instance from a clear shut down

PGPAWAN commented 1 year ago

Thanks its working. it was because of the acrhive_command. Since we have disable the archiving in lower environment.