vitabaks / postgresql_cluster

PostgreSQL High-Availability Cluster (based on "Patroni" and DCS "etcd" or "consul"). Automating with Ansible.
MIT License
1.29k stars 353 forks source link

solved #364

Closed emanfeah closed 11 months ago

vitabaks commented 11 months ago

Hi @emanfeah

The log you provided shows that the Patroni service has encountered problems connecting to your etcd nodes at addresses 10.0.30.53 , 10.0.30.8, and 10.0.30.223 (port 2379).

To solve this problem, I recommend starting by checking the availability and health of etcd nodes. Then make sure there are no connection problems between the nodes.

from db node:

ping 10.0.30.53
ping 10.0.30.8
ping 10.0.30.223
telnet 10.0.30.53 2379
telnet 10.0.30.8 2379
telnet 10.0.30.223 2379

on etcd node

etcdctl version
etcdctl member list
etcdctl endpoint health --cluster

sudo journalctl -u etcd -n 1000
vitabaks commented 11 months ago

it looks like the etcd cluster is healthy.

Are you using the latest version of the playbook from the master branch?

etcdctl version: 3.5.7 API version: 3.5

Since we switched to using the API v3 and in your logs I see that Patroni is trying to use the etcd API v2

Failed to get list of machines from http://10.0.30.223:2379/v2

Please show the configuration file /etc/patroni/patroni.yml

the etcd3 section should be declared there, not etcd

https://github.com/vitabaks/postgresql_cluster/blob/master/roles/patroni/templates/patroni.yml.j2#L33

vitabaks commented 11 months ago

PostgreSQL is not running because Patroni is not running due to etcd connection problems.

i use the master branch -- etcd3

An interesting situation, I don't have any ideas yet on how to reproduce it.

Please attach the archive of your playbook, I'll take a look at it.

vitabaks commented 11 months ago

@emanfeah I have not been able to reproduce the problem using your variable files (from mail).

TASK [patroni : Prepare PostgreSQL | make sure the postgresql log directory "/data1/var/log/postgresql" exists] ***
changed: [10.172.0.21]
changed: [10.172.0.20]
changed: [10.172.0.22]

TASK [patroni : Prepare PostgreSQL | make sure PostgreSQL data directory "/data1/var/lib/postgresql/14/patroni" exists] ***
changed: [10.172.0.22]
changed: [10.172.0.21]
changed: [10.172.0.20]

TASK [patroni : Prepare PostgreSQL | check that data directory "/data1/var/lib/postgresql/14/patroni" is not initialized] ***
ok: [10.172.0.21]
ok: [10.172.0.20]
ok: [10.172.0.22]

TASK [patroni : Prepare PostgreSQL | make sure the postgresql config files exists] ***
ok: [10.172.0.20]
ok: [10.172.0.21]
ok: [10.172.0.22]

TASK [patroni : Prepare PostgreSQL | generate default postgresql config files] ***
changed: [10.172.0.20]
changed: [10.172.0.21]
changed: [10.172.0.22]

TASK [patroni : Prepare PostgreSQL | make sure the data directory "/data1/var/lib/postgresql/14/patroni" is empty] ***
changed: [10.172.0.20] => (item=absent)
changed: [10.172.0.22] => (item=absent)
changed: [10.172.0.21] => (item=absent)
changed: [10.172.0.21] => (item=directory)
changed: [10.172.0.20] => (item=directory)
changed: [10.172.0.22] => (item=directory)

TASK [patroni : Start patroni service on the Master server] ********************
changed: [10.172.0.20]

TASK [patroni : Wait for port 8008 to become open on the host] *****************
ok: [10.172.0.20]

TASK [patroni : Check PostgreSQL is started and accepting connections on Master] ***
ok: [10.172.0.20]

TASK [patroni : Wait for the cluster to initialize (master is the leader with the lock)] ***
ok: [10.172.0.20]
vitabaks commented 11 months ago

I don't think so, since my tests passed successfully. There is another problem in your case in the interaction of Patroni with the etcd cluster, which I may not be aware of.

Apparently your environment is somewhat different from what I used for testing. Try to deploy the cluster without changing any variables other than addresses in inventory.

emanfeah commented 11 months ago

do you think the problem its in

 pg_hba:  # Add following lines to pg_hba.conf after running 'initdb'
    - host replication replicator 127.0.0.1/32 md5
    - host all all 0.0.0.0/0 md5

???? i think it must be a master host ..

vitabaks commented 11 months ago

These are the default values for initdb and we redefine them further.

You can check the contents of the pg_hba.conf file after deployment

https://github.com/vitabaks/postgresql_cluster/blob/master/roles/patroni/tasks/main.yml#L858 https://github.com/vitabaks/postgresql_cluster/blob/master/roles/patroni/templates/pg_hba.conf.j2

vitabaks commented 11 months ago

Try to deploy the cluster without changing any variables other than addresses in inventory.

Let me know if you managed to deploy the cluster with the default values.

vitabaks commented 11 months ago

well, now the problem is at an earlier stage - the etcd cluster does not start

please attach logs from all servers sudo journalctl -u etcd

emanfeah commented 11 months ago

its work thank you

vitabaks commented 11 months ago

its work thank you

Ok.

Can you provide more details about your deployment? I would really like to find a way to reproduce your problem.