vitabaks / postgresql_cluster

PostgreSQL High-Availability Cluster (based on "Patroni" and DCS "etcd" or "consul"). Automating with Ansible.
MIT License
1.29k stars 352 forks source link

Deploy issue on Debian 11 #164

Closed blackyzero closed 2 years ago

blackyzero commented 2 years ago

Hello,

I am getting following error during deployment.

changed: [192.168.169.151] => (item=absent)
changed: [192.168.169.152] => (item=absent)
changed: [192.168.169.153] => (item=absent)
changed: [192.168.169.152] => (item=directory)
changed: [192.168.169.151] => (item=directory)
changed: [192.168.169.153] => (item=directory)

TASK [patroni : Start patroni service on the Master server] *******************************************************************************************
changed: [192.168.169.151]

TASK [patroni : Wait for port 8008 to become open on the host] ****************************************************************************************
fatal: [192.168.169.151]: FAILED! => {"changed": false, "elapsed": 120, "msg": "Timeout when waiting for 192.168.169.151:8008"}

NO MORE HOSTS LEFT ************************************************************************************************************************************

PLAY RECAP ********************************************************************************************************************************************
192.168.169.151            : ok=123  changed=6    unreachable=0    failed=1    skipped=262  rescued=0    ignored=0
192.168.169.152            : ok=120  changed=4    unreachable=0    failed=0    skipped=258  rescued=0    ignored=0
192.168.169.153            : ok=120  changed=4    unreachable=0    failed=0    skipped=258  rescued=0    ignored=0
localhost                  : ok=0    changed=0    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0

root@ansible:/opt/services/ansible/postgresql_cluster#

On failure host, i tried to start patroni service manually, but still getting same error. Please see daemon log below,

Apr 22 00:00:53 pgnode01 systemd[1]: Started Runners to orchestrate a high-availability PostgreSQL - patroni.
░░ Subject: A start job for unit patroni.service has finished successfully
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A start job for unit patroni.service has finished successfully.
░░
░░ The job identifier is 3926.
Apr 22 00:00:53 pgnode01 patroni[41452]: 2022-04-22 00:00:53,610 INFO: Selected new etcd server http://192.168.169.152:2379
Apr 22 00:00:53 pgnode01 patroni[41452]: 2022-04-22 00:00:53,622 INFO: No PostgreSQL configuration items changed, nothing to reload.
Apr 22 00:00:53 pgnode01 patroni[41452]: 2022-04-22 00:00:53,648 INFO: Lock owner: None; I am pgnode01
Apr 22 00:00:53 pgnode01 patroni[41452]: 2022-04-22 00:00:53,671 INFO: trying to bootstrap a new cluster
Apr 22 00:00:53 pgnode01 patroni[41492]: The files belonging to this database system will be owned by user "postgres".
Apr 22 00:00:53 pgnode01 patroni[41492]: This user must also own the server process.
Apr 22 00:00:53 pgnode01 patroni[41492]: The database cluster will be initialized with locale "en_US.UTF-8".
Apr 22 00:00:53 pgnode01 patroni[41492]: The default text search configuration will be set to "english".
Apr 22 00:00:53 pgnode01 patroni[41492]: Data page checksums are enabled.
Apr 22 00:00:53 pgnode01 patroni[41492]: fixing permissions on existing directory /var/lib/postgresql/14/main ... ok
Apr 22 00:00:53 pgnode01 patroni[41492]: creating subdirectories ... ok
Apr 22 00:00:53 pgnode01 patroni[41492]: selecting dynamic shared memory implementation ... posix
Apr 22 00:00:53 pgnode01 patroni[41492]: selecting default max_connections ... 100
Apr 22 00:00:54 pgnode01 patroni[41492]: selecting default shared_buffers ... 128MB
Apr 22 00:00:54 pgnode01 patroni[41492]: selecting default time zone ... America/New_York
Apr 22 00:00:54 pgnode01 patroni[41492]: creating configuration files ... ok
Apr 22 00:00:54 pgnode01 patroni[41492]: running bootstrap script ... ok
Apr 22 00:00:55 pgnode01 patroni[41492]: performing post-bootstrap initialization ... ok
Apr 22 00:00:55 pgnode01 patroni[41492]: syncing data to disk ... ok
Apr 22 00:00:55 pgnode01 patroni[41492]: initdb: warning: enabling "trust" authentication for local connections
Apr 22 00:00:55 pgnode01 patroni[41492]: You can change this by editing pg_hba.conf or using the option -A, or
Apr 22 00:00:55 pgnode01 patroni[41492]: --auth-local and --auth-host, the next time you run initdb.
Apr 22 00:00:55 pgnode01 patroni[41492]: Success. You can now start the database server using:
Apr 22 00:00:55 pgnode01 patroni[41492]:     /usr/lib/postgresql/14/bin/pg_ctl -D /var/lib/postgresql/14/main -l logfile start
Apr 22 00:00:56 pgnode01 patroni[41452]: 2022-04-22 00:00:56,303 INFO: postmaster pid=41531
Apr 22 00:00:56 pgnode01 patroni[41532]: /var/run/postgresql:5432 - no response
Apr 22 00:00:56 pgnode01 patroni[41531]: 2022-04-22 00:00:56 EDT [41531-1]  LOG:  redirecting log output to logging collector process
Apr 22 00:00:56 pgnode01 patroni[41531]: 2022-04-22 00:00:56 EDT [41531-2]  HINT:  Future log output will appear in directory "/var/log/postgresql".
Apr 22 00:00:57 pgnode01 patroni[41558]: /var/run/postgresql:5432 - accepting connections
Apr 22 00:00:57 pgnode01 patroni[41562]: /var/run/postgresql:5432 - accepting connections
Apr 22 00:00:57 pgnode01 patroni[41452]: 2022-04-22 00:00:57,447 INFO: establishing a new patroni connection to the postgres cluster
Apr 22 00:00:57 pgnode01 patroni[41452]: 2022-04-22 00:00:57,529 ERROR: get_postgresql_status
Apr 22 00:00:57 pgnode01 patroni[41452]: Traceback (most recent call last):
Apr 22 00:00:57 pgnode01 patroni[41452]:   File "/usr/local/lib/python3.9/dist-packages/patroni/api.py", line 675, in query
Apr 22 00:00:57 pgnode01 patroni[41452]:     with self.patroni.postgresql.connection().cursor() as cursor:
Apr 22 00:00:57 pgnode01 patroni[41452]:   File "/usr/local/lib/python3.9/dist-packages/patroni/postgresql/__init__.py", line 255, in connection
Apr 22 00:00:57 pgnode01 patroni[41452]:     return self._connection.get()
Apr 22 00:00:57 pgnode01 patroni[41452]:   File "/usr/local/lib/python3.9/dist-packages/patroni/postgresql/connection.py", line 24, in get
Apr 22 00:00:57 pgnode01 patroni[41452]:     self._connection = psycopg.connect(**self._conn_kwargs)
Apr 22 00:00:57 pgnode01 patroni[41452]:   File "/usr/lib/python3/dist-packages/psycopg2/__init__.py", line 127, in connect
Apr 22 00:00:57 pgnode01 patroni[41452]:     conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
Apr 22 00:00:57 pgnode01 patroni[41452]: psycopg2.OperationalError: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: FATAL:  Peer authentication failed for user "patroni_usr"
Apr 22 00:00:57 pgnode01 patroni[41452]: During handling of the above exception, another exception occurred:
Apr 22 00:00:57 pgnode01 patroni[41452]: Traceback (most recent call last):
Apr 22 00:00:57 pgnode01 patroni[41452]:   File "/usr/local/lib/python3.9/dist-packages/patroni/api.py", line 610, in get_postgresql_status
Apr 22 00:00:57 pgnode01 patroni[41452]:     row = self.query(stmt.format(postgresql.wal_name, postgresql.lsn_name), retry=retry)[0]
Apr 22 00:00:57 pgnode01 patroni[41452]:   File "/usr/local/lib/python3.9/dist-packages/patroni/api.py", line 592, in query
Apr 22 00:00:57 pgnode01 patroni[41452]:     return self.server.query(sql, *params)
Apr 22 00:00:57 pgnode01 patroni[41452]:   File "/usr/local/lib/python3.9/dist-packages/patroni/api.py", line 681, in query
Apr 22 00:00:57 pgnode01 patroni[41452]:     raise PostgresConnectionException('connection problems')
Apr 22 00:00:57 pgnode01 patroni[41452]: patroni.exceptions.PostgresConnectionException: 'connection problems'
Apr 22 00:00:57 pgnode01 patroni[41452]: 2022-04-22 00:00:57,587 ERROR: get_postgresql_status
Apr 22 00:00:57 pgnode01 patroni[41452]: Traceback (most recent call last):
Apr 22 00:00:57 pgnode01 patroni[41452]:   File "/usr/local/lib/python3.9/dist-packages/patroni/api.py", line 675, in query
Apr 22 00:00:57 pgnode01 patroni[41452]:     with self.patroni.postgresql.connection().cursor() as cursor:
Apr 22 00:00:57 pgnode01 patroni[41452]:   File "/usr/local/lib/python3.9/dist-packages/patroni/postgresql/__init__.py", line 255, in connection
Apr 22 00:00:57 pgnode01 patroni[41452]:     return self._connection.get()
Apr 22 00:00:57 pgnode01 patroni[41452]:   File "/usr/local/lib/python3.9/dist-packages/patroni/postgresql/connection.py", line 24, in get
Apr 22 00:00:57 pgnode01 patroni[41452]:     self._connection = psycopg.connect(**self._conn_kwargs)
Apr 22 00:00:57 pgnode01 patroni[41452]:   File "/usr/lib/python3/dist-packages/psycopg2/__init__.py", line 127, in connect
Apr 22 00:00:57 pgnode01 patroni[41452]:     conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
Apr 22 00:00:57 pgnode01 patroni[41452]: psycopg2.OperationalError: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: FATAL:  Peer authentication failed for user "patroni_usr"
Apr 22 00:00:57 pgnode01 patroni[41452]: During handling of the above exception, another exception occurred:
Apr 22 00:00:57 pgnode01 patroni[41452]: Traceback (most recent call last):
Apr 22 00:00:57 pgnode01 patroni[41452]:   File "/usr/local/lib/python3.9/dist-packages/patroni/api.py", line 610, in get_postgresql_status
Apr 22 00:00:57 pgnode01 patroni[41452]:     row = self.query(stmt.format(postgresql.wal_name, postgresql.lsn_name), retry=retry)[0]
Apr 22 00:00:57 pgnode01 patroni[41452]:   File "/usr/local/lib/python3.9/dist-packages/patroni/api.py", line 592, in query
Apr 22 00:00:57 pgnode01 patroni[41452]:     return self.server.query(sql, *params)
Apr 22 00:00:57 pgnode01 patroni[41452]:   File "/usr/local/lib/python3.9/dist-packages/patroni/api.py", line 681, in query
Apr 22 00:00:57 pgnode01 patroni[41452]:     raise PostgresConnectionException('connection problems')
Apr 22 00:00:57 pgnode01 patroni[41452]: patroni.exceptions.PostgresConnectionException: 'connection problems'
Apr 22 00:00:57 pgnode01 patroni[41452]: 2022-04-22 00:00:57,605 ERROR: get_postgresql_status
Apr 22 00:00:57 pgnode01 patroni[41452]: Traceback (most recent call last):
Apr 22 00:00:57 pgnode01 patroni[41452]:   File "/usr/local/lib/python3.9/dist-packages/patroni/api.py", line 675, in query
Apr 22 00:00:57 pgnode01 patroni[41452]:     with self.patroni.postgresql.connection().cursor() as cursor:
Apr 22 00:00:57 pgnode01 patroni[41452]:   File "/usr/local/lib/python3.9/dist-packages/patroni/postgresql/__init__.py", line 255, in connection
Apr 22 00:00:57 pgnode01 patroni[41452]:     return self._connection.get()
Apr 22 00:00:57 pgnode01 patroni[41452]:   File "/usr/local/lib/python3.9/dist-packages/patroni/postgresql/connection.py", line 24, in get
Apr 22 00:00:57 pgnode01 patroni[41452]:     self._connection = psycopg.connect(**self._conn_kwargs)
Apr 22 00:00:57 pgnode01 patroni[41452]:   File "/usr/lib/python3/dist-packages/psycopg2/__init__.py", line 127, in connect
Apr 22 00:00:57 pgnode01 patroni[41452]:     conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
Apr 22 00:00:57 pgnode01 patroni[41452]: psycopg2.OperationalError: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: FATAL:  Peer authentication failed for user "patroni_usr"
Apr 22 00:00:57 pgnode01 patroni[41452]: During handling of the above exception, another exception occurred:
Apr 22 00:00:57 pgnode01 patroni[41452]: Traceback (most recent call last):
Apr 22 00:00:57 pgnode01 patroni[41452]:   File "/usr/local/lib/python3.9/dist-packages/patroni/api.py", line 610, in get_postgresql_status
Apr 22 00:00:57 pgnode01 patroni[41452]:     row = self.query(stmt.format(postgresql.wal_name, postgresql.lsn_name), retry=retry)[0]
Apr 22 00:00:57 pgnode01 patroni[41452]:   File "/usr/local/lib/python3.9/dist-packages/patroni/api.py", line 592, in query
Apr 22 00:00:57 pgnode01 patroni[41452]:     return self.server.query(sql, *params)
Apr 22 00:00:57 pgnode01 patroni[41452]:   File "/usr/local/lib/python3.9/dist-packages/patroni/api.py", line 681, in query
Apr 22 00:00:57 pgnode01 patroni[41452]:     raise PostgresConnectionException('connection problems')
Apr 22 00:00:57 pgnode01 patroni[41452]: patroni.exceptions.PostgresConnectionException: 'connection problems'
Apr 22 00:00:57 pgnode01 patroni[41452]: 2022-04-22 00:00:57,706 INFO: establishing a new patroni connection to the postgres cluster
Apr 22 00:00:57 pgnode01 patroni[41452]: 2022-04-22 00:00:57,712 WARNING: Retry got exception: 'connection problems'
Apr 22 00:00:57 pgnode01 patroni[41572]: /var/run/postgresql:5432 - accepting connections
Apr 22 00:00:57 pgnode01 patroni[41452]: 2022-04-22 00:00:57,754 WARNING: Failed to determine PostgreSQL state from the connection, falling back to cached role
Apr 22 00:00:57 pgnode01 patroni[41452]: 2022-04-22 00:00:57,789 INFO: running post_bootstrap
Apr 22 00:00:57 pgnode01 patroni[41452]: 2022-04-22 00:00:57,798 ERROR: post_bootstrap
Apr 22 00:00:57 pgnode01 patroni[41452]: Traceback (most recent call last):
Apr 22 00:00:57 pgnode01 patroni[41452]:   File "/usr/local/lib/python3.9/dist-packages/patroni/postgresql/bootstrap.py", line 330, in post_bootstrap
Apr 22 00:00:57 pgnode01 patroni[41452]:     self.create_or_update_role(superuser['username'], superuser['password'], ['SUPERUSER'])
Apr 22 00:00:57 pgnode01 patroni[41452]:   File "/usr/local/lib/python3.9/dist-packages/patroni/postgresql/bootstrap.py", line 314, in create_or_update_role
Apr 22 00:00:57 pgnode01 patroni[41452]:     END;$$""".format(quote_literal(name), quote_ident(name, self._postgresql.connection()), ' '.join(options))
Apr 22 00:00:57 pgnode01 patroni[41452]:   File "/usr/local/lib/python3.9/dist-packages/patroni/postgresql/__init__.py", line 255, in connection
Apr 22 00:00:57 pgnode01 patroni[41452]:     return self._connection.get()
Apr 22 00:00:57 pgnode01 patroni[41452]:   File "/usr/local/lib/python3.9/dist-packages/patroni/postgresql/connection.py", line 24, in get
Apr 22 00:00:57 pgnode01 patroni[41452]:     self._connection = psycopg.connect(**self._conn_kwargs)
Apr 22 00:00:57 pgnode01 patroni[41452]:   File "/usr/lib/python3/dist-packages/psycopg2/__init__.py", line 127, in connect
Apr 22 00:00:57 pgnode01 patroni[41452]:     conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
Apr 22 00:00:57 pgnode01 patroni[41452]: psycopg2.OperationalError: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: FATAL:  Peer authentication failed for user "patroni_usr"
Apr 22 00:00:57 pgnode01 patroni[41452]: 2022-04-22 00:00:57,807 INFO: removing initialize key after failed attempt to bootstrap the cluster
Apr 22 00:00:57 pgnode01 patroni[41452]: 2022-04-22 00:00:57,889 INFO: renaming data directory to /var/lib/postgresql/14/main_2022-04-22-00-00-57
Apr 22 00:00:58 pgnode01 patroni[41452]: Traceback (most recent call last):
Apr 22 00:00:58 pgnode01 patroni[41452]:   File "/usr/local/bin/patroni", line 8, in <module>
Apr 22 00:00:58 pgnode01 patroni[41452]:     sys.exit(main())
Apr 22 00:00:58 pgnode01 patroni[41452]:   File "/usr/local/lib/python3.9/dist-packages/patroni/__main__.py", line 143, in main
Apr 22 00:00:58 pgnode01 patroni[41452]:     return patroni_main()
Apr 22 00:00:58 pgnode01 patroni[41452]:   File "/usr/local/lib/python3.9/dist-packages/patroni/__main__.py", line 135, in patroni_main
Apr 22 00:00:58 pgnode01 patroni[41452]:     abstract_main(Patroni, schema)
Apr 22 00:00:58 pgnode01 patroni[41452]:   File "/usr/local/lib/python3.9/dist-packages/patroni/daemon.py", line 100, in abstract_main
Apr 22 00:00:58 pgnode01 patroni[41452]:     controller.run()
Apr 22 00:00:58 pgnode01 patroni[41452]:   File "/usr/local/lib/python3.9/dist-packages/patroni/__main__.py", line 105, in run
Apr 22 00:00:58 pgnode01 patroni[41452]:     super(Patroni, self).run()
Apr 22 00:00:58 pgnode01 patroni[41452]:   File "/usr/local/lib/python3.9/dist-packages/patroni/daemon.py", line 59, in run
Apr 22 00:00:58 pgnode01 patroni[41452]:     self._run_cycle()
Apr 22 00:00:58 pgnode01 patroni[41452]:   File "/usr/local/lib/python3.9/dist-packages/patroni/__main__.py", line 108, in _run_cycle
Apr 22 00:00:58 pgnode01 patroni[41452]:     logger.info(self.ha.run_cycle())
Apr 22 00:00:58 pgnode01 patroni[41452]:   File "/usr/local/lib/python3.9/dist-packages/patroni/ha.py", line 1503, in run_cycle
Apr 22 00:00:58 pgnode01 patroni[41452]:     info = self._run_cycle()
Apr 22 00:00:58 pgnode01 patroni[41452]:   File "/usr/local/lib/python3.9/dist-packages/patroni/ha.py", line 1377, in _run_cycle
Apr 22 00:00:58 pgnode01 patroni[41452]:     return self.post_bootstrap()
Apr 22 00:00:58 pgnode01 patroni[41452]:   File "/usr/local/lib/python3.9/dist-packages/patroni/ha.py", line 1269, in post_bootstrap
Apr 22 00:00:58 pgnode01 patroni[41452]:     self.cancel_initialization()
Apr 22 00:00:58 pgnode01 patroni[41452]:   File "/usr/local/lib/python3.9/dist-packages/patroni/ha.py", line 1262, in cancel_initialization
Apr 22 00:00:58 pgnode01 patroni[41452]:     raise PatroniFatalException('Failed to bootstrap cluster')
Apr 22 00:00:58 pgnode01 patroni[41452]: patroni.exceptions.PatroniFatalException: 'Failed to bootstrap cluster'
Apr 22 00:00:58 pgnode01 systemd[1]: patroni.service: Main process exited, code=exited, status=1/FAILURE
░░ Subject: Unit process exited
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ An ExecStart= process belonging to unit patroni.service has exited.
░░
░░ The process' exit code is 'exited' and its exit status is 1.
Apr 22 00:00:58 pgnode01 systemd[1]: patroni.service: Failed with result 'exit-code'.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ The unit patroni.service has entered the 'failed' state with result 'exit-code'.
Apr 22 00:00:58 pgnode01 systemd[1]: patroni.service: Consumed 3.053s CPU time.
░░ Subject: Resources consumed by unit runtime
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ The unit patroni.service completed and consumed the indicated resources.

How could i fix it? Thank you.

vitabaks commented 2 years ago

psycopg2.OperationalError: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: FATAL: Peer authentication failed for user "patroni_usr"

please try to add "patroni_usr" to the postgresql_pg_hba variable

- {type: "local", database: "all", user: "patroni_usr", address: "", method: "trust"}

or

- {type: "local", database: "all", user: "{{ patroni_superuser_username }}", address: "", method: "trust"}
blackyzero commented 2 years ago

thanks @vitabaks for help. Indeed, i used different user which is "patroni_usr" instead of the default one.

I also tried to add each of them, or even both into postgresql_pg_hba section, but it seems not been applied to nodes. Please see below the file content on node pgsql01.

...
...
# DO NOT DISABLE!
# If you change this first entry you will need to make sure that the
# database superuser can access the database using some other method.
# Noninteractive access to all databases is required during automatic
# maintenance (custom daily cronjobs, replication, and similar tasks).
#
# Database administrative login by Unix domain socket
local   all             postgres                                peer

# TYPE  DATABASE        USER            ADDRESS                 METHOD

# "local" is for Unix domain socket connections only
local   all             all                                     peer
# IPv4 local connections:
host    all             all             127.0.0.1/32            scram-sha-256
# IPv6 local connections:
host    all             all             ::1/128                 scram-sha-256
# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     all                                     peer
host    replication     all             127.0.0.1/32            scram-sha-256
host    replication     all             ::1/128                 scram-sha-256

host replication patroni_replicator_usr 127.0.0.1/32 md5
host all all 0.0.0.0/0 md5

host replication patroni_replicator_usr 127.0.0.1/32 md5
host all all 0.0.0.0/0 md5

host replication patroni_replicator_usr 127.0.0.1/32 md5
host all all 0.0.0.0/0 md5

host replication patroni_replicator_usr 127.0.0.1/32 md5
host all all 0.0.0.0/0 md5

host replication patroni_replicator_usr 127.0.0.1/32 md5
host all all 0.0.0.0/0 md5

host replication patroni_replicator_usr 127.0.0.1/32 md5
host all all 0.0.0.0/0 md5

host replication patroni_replicator_usr 127.0.0.1/32 md5
host all all 0.0.0.0/0 md5

host replication patroni_replicator_usr 127.0.0.1/32 md5
host all all 0.0.0.0/0 md5

host replication patroni_replicator_usr 127.0.0.1/32 md5
host all all 0.0.0.0/0 md5

host replication patroni_replicator_usr 127.0.0.1/32 md5
host all all 0.0.0.0/0 md5
blackyzero commented 2 years ago

in addition, i got 3 more lines in file "pg_hba.conf" at each time completed running playbook.

vitabaks commented 2 years ago

pg_hba.conf is a little different in the format that the playbook should have prepared it https://github.com/vitabaks/postgresql_cluster/blob/master/roles/patroni/templates/pg_hba.conf.j2

please attach your version of the playbook for analysis

blackyzero commented 2 years ago

I don't know how to get version of playbook, so i use following command

root@ansible:/opt/services/ansible/postgresql_cluster# git log -1
commit 9fa797cfbeff2cd9de3c279731ebad178e1335a5 (HEAD -> master, origin/master, origin/HEAD)
Merge: d17a520 afb2e6b
Author: Vitaliy Kukharik <37010174+vitabaks@users.noreply.github.com>
Date:   Thu Apr 21 22:17:11 2022 +0300

    Merge pull request #163 from jimnydev/master

    pgbouncer: ignore_startup_parameters variable

Is it info you want to know ? Thank you.

vitabaks commented 2 years ago

make an archive of the postgresql_cluster directory and attach it . it is important for me to see what changes have been made, and also I plan to try to repeat your problem.

blackyzero commented 2 years ago

Please get the file here pass:

Thank you.

vitabaks commented 2 years ago

Invalid file password

You can attach the file here

blackyzero commented 2 years ago

my bad, sorry. Please use this one ""

Thank you.

blackyzero commented 2 years ago

Update: the deployment runs successfully after reverted back patroni superuser

patroni_superuser_username: "postgres"

Maybe it needs to improve to support different username instead of the default one.

Thank you.

vitabaks commented 2 years ago

thanks. I will check it

blackyzero commented 2 years ago

Another thing, is that possible to add a custom name or label for VIP interface? , ex:

vip_interface: "{{ ansible_default_ipv4.interface }}:PGSQL"  # interface name (ex. "ens32")

I added like this, but ansible render the keepalived confg as below:

vrrp_instance VI_1 {
   interface ens18:PGSQL
   virtual_router_id 150
   priority  100
   advert_int 2
   state  BACKUP
   virtual_ipaddress {
       192.168.169.150
   }

Although the network interface on system likes below:

root@pgnode01:/etc/postgresql/14/main# ip a s |grep 192.168.169.
    inet 192.168.169.151/24 brd 192.168.169.255 scope global dynamic ens18

As result, keepalived service is unable to start.

Apr 23 09:50:34 pgnode01 Keepalived[154749]: Starting Keepalived v2.1.5 (07/13,2020)
Apr 23 09:50:34 pgnode01 Keepalived[154749]: WARNING - keepalived was build for newer Linux 5.10.70, running on Linux 5.10.0-13-amd64 #1 SMP Debian 5.10.106-1 (2022-03-17)
Apr 23 09:50:34 pgnode01 Keepalived[154749]: Command line: '/usr/sbin/keepalived' '--dont-fork'
Apr 23 09:50:34 pgnode01 Keepalived[154749]: Opening file '/etc/keepalived/keepalived.conf'.
Apr 23 09:50:34 pgnode01 Keepalived[154749]: NOTICE: setting config option max_auto_priority should result in better keepalived performance
Apr 23 09:50:34 pgnode01 Keepalived[154749]: Starting VRRP child process, pid=154750
Apr 23 09:50:34 pgnode01 Keepalived_vrrp[154750]: Registering Kernel netlink reflector
Apr 23 09:50:34 pgnode01 Keepalived_vrrp[154750]: Registering Kernel netlink command channel
Apr 23 09:50:34 pgnode01 Keepalived_vrrp[154750]: Opening file '/etc/keepalived/keepalived.conf'.
Apr 23 09:50:34 pgnode01 Keepalived_vrrp[154750]: (/etc/keepalived/keepalived.conf: Line 14) WARNING - interface ens18:PGSQL for vrrp_instance VI_1 doesn't exist
Apr 23 09:50:34 pgnode01 Keepalived_vrrp[154750]: Non-existent interface specified in configuration
Apr 23 09:50:34 pgnode01 Keepalived_vrrp[154750]: Stopped
Apr 23 09:50:34 pgnode01 Keepalived[154749]: pid 154750 exited with permanent error CONFIG. Terminating
Apr 23 09:50:34 pgnode01 Keepalived[154749]: Stopped Keepalived v2.1.5 (07/13,2020)
Apr 23 09:50:34 pgnode01 systemd[1]: keepalived.service: Succeeded.

Thank you.

vitabaks commented 2 years ago

vip_interface: "ens18"

blackyzero commented 2 years ago

yes, i got it. Is that possible to add variable for VIP interface label?

Thank you.

vitabaks commented 2 years ago

what label? please describe what it is needed for?

blackyzero commented 2 years ago

i mean about creating separated keepalived VIP interface like below instead of using same server network interface with VIP IP.

ens18:PGSQL
vitabaks commented 2 years ago

As far as I know, in the keepalived configuration, we must specify an already existing interface on top of which the virtual IP will be launched.

Judging by this error "interface ens18:PGSQL for vrrp_instance VI_1 doesn't exist", there is no such interface in the system.

vitabaks commented 2 years ago

Maybe it needs to improve to support different username instead of the default one.

Fixed - 271a357317b1b2f1d147bea76ced751b5dc9aea2. Please test it.

blackyzero commented 2 years ago

thank you @vitabaks for your fixing.

I change to use different username "patroni_usr", and following this section "How to start from scratch " and getting following error

TASK [patroni : Add PATRONICTL_CONFIG_FILE environment variable into /etc/environment] ****************************************************************
ok: [192.168.169.152]
ok: [192.168.169.151]
ok: [192.168.169.153]

TASK [pgbouncer/userlist : Get users and password md5 from pg_shadow] *********************************************************************************
fatal: [192.168.169.151]: FAILED! => {"changed": false, "cmd": ["/usr/lib/postgresql/14/bin/psql", "-p", "5432", "-U", "postgres", "-Atq", "-c", "SELECT concat('\"', usename, '\" \"', passwd, '\"') FROM pg_shadow where usename != 'patroni_replicator_usr'"], "delta": "0:00:00.022013", "end": "2022-04-23 22:32:55.653760", "msg": "non-zero return code", "rc": 2, "start": "2022-04-23 22:32:55.631747", "stderr": "psql: error: connection to server on socket \"/var/run/postgresql/.s.PGSQL.5432\" failed: FATAL:  role \"postgres\" does not exist", "stderr_lines": ["psql: error: connection to server on socket \"/var/run/postgresql/.s.PGSQL.5432\" failed: FATAL:  role \"postgres\" does not exist"], "stdout": "", "stdout_lines": []}

NO MORE HOSTS LEFT ************************************************************************************************************************************

PLAY RECAP ********************************************************************************************************************************************
192.168.169.151            : ok=131  changed=10   unreachable=0    failed=1    skipped=300  rescued=0    ignored=0
192.168.169.152            : ok=128  changed=10   unreachable=0    failed=0    skipped=293  rescued=0    ignored=0
192.168.169.153            : ok=128  changed=10   unreachable=0    failed=0    skipped=293  rescued=0    ignored=0
localhost                  : ok=0    changed=0    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0

root@ansible:/opt/services/ansible/postgresql_cluster#
vitabaks commented 2 years ago

I can suggest that you did not re-download the playbook, this has already been fixed.

blackyzero commented 2 years ago

right. i just did git pull , let me download it, and try again.

blackyzero commented 2 years ago

Confirm. The deployment works without issue. Thanks a lot for your great work !