vitabaks / postgresql_cluster

PostgreSQL High-Availability Cluster (based on "Patroni" and DCS "etcd" or "consul"). Automating with Ansible.
MIT License
1.29k stars 352 forks source link

Patroni/etcd 404 error #627

Closed dtdionne closed 1 month ago

dtdionne commented 1 month ago

Greetings, just getting started and indeed v3alpha throws a 404. All are pristine Alma 9 installs.

- Apr 15 09:13:59 master patroni[53177]: 2024-04-15 09:13:59,479 WARNING: Detected Etcd version 3.0.0 is lower than 3.1.0, watches are not supported
- Apr 15 09:13:59 master patroni[53177]: 2024-04-15 09:13:59,521 ERROR: Failed to get list of machines from http://192.168.1.239:2379/v3alpha: <Unknown error: '404 page not found', code: 2>
- Apr 15 09:13:59 master patroni[53177]: 2024-04-15 09:13:59,564 ERROR: Failed to get list of machines from http://192.168.1.238:2379/v3alpha: <Unknown error: '404 page not found', code: 2>
- Apr 15 09:14:01 master patroni[53177]: 2024-04-15 09:14:01,233 ERROR: Failed to get list of machines from http://192.168.1.237:2379/v3alpha: MaxRetryError("HTTPConnectionPool(host='192.168.1.237', port=2379):>
- Apr 15 09:14:01 master patroni[53177]: 2024-04-15 09:14:01,233 INFO: waiting on etcd
- Apr 15 09:14:06 master patroni[53177]: 2024-04-15 09:14:06,235 WARNING: Detected Etcd version 3.0.0 is lower than 3.1.0, watches are not supported
- Apr 15 09:14:06 master patroni[53177]: 2024-04-15 09:14:06,276 ERROR: Failed to get list of machines from http://192.168.1.239:2379/v3alpha: <Unknown error: '404 page not found', code: 2>
- Apr 15 09:14:06 master patroni[53177]: 2024-04-15 09:14:06,320 ERROR: Failed to get list of machines from http://192.168.1.238:2379/v3alpha: <Unknown error: '404 page not found', code: 2>
- Apr 15 09:14:06 master patroni[53177]: 2024-04-15 09:14:06,323 ERROR: Failed to get list of machines from http://192.168.1.237:2379/v3alpha: MaxRetryError("HTTPConnectionPool(host='192.168.1.237', port=2379):>
- Apr 15 09:14:06 master patroni[53177]: 2024-04-15 09:14:06,323 INFO: waiting on etcd
vitabaks commented 1 month ago

Why are you using such an old version of etcd? Use 3.5.11 or higher.

dtdionne commented 1 month ago

That's a great question, and i'm embarrassed to say i have no idea! Does this not install etcd on all hosts when the playbook is run?

dtdionne commented 1 month ago

I'm brand new at this too so forgive me...

[root@node1 postgresql_cluster]# etcd --version etcd Version: 3.5.12 Git SHA: e7b3bb6cc Go Version: go1.20.13 Go OS/Arch: linux/amd64

vitabaks commented 1 month ago

Please check version on all nodes.

dtdionne commented 1 month ago

Did i goof something in main? There weren't many options, this is on a test lab lan so no proxies. The playbook ran great until the end.

There's a slight chance someone goofed around on one of the hosts before i cloned this repo. I'll snapshot them all back to yum'd pristine, re-clone and give it another go.

I checked all 3 before snapping back and they were all running the same version of etcd, But those machine states are gone now...I figured i screwed something up.

dtdionne commented 1 month ago

It’s installing now, some observations…

Bone stock alma9 server needs to have the firewall disabled or configured, the first run failed because of this, I think. I disabled the firewall on all hosts and it’s gotten further this time but it was stuck checking etc health. I stepped outside on retry 5 of the minus 10 countdown and I remember something like this happening the last night. I’m an old cagey iptables guy and I’m too lazy and grumpy to even read about this new fangled firewall.

I’m pretty sure this one will fail but I’ll run the playbook again.

Once the playbook completes it throws an error when run again, also with the cluster-clear option. Something about line 5 main no such thing as ansible.

But this appears to be great work, thank you.

vitabaks commented 1 month ago

I’m an old cagey iptables guy

Please see automation for iptables https://github.com/vitabaks/postgresql_cluster/blob/master/vars/system.yml#L128

I think it will be useful for you.

dtdionne commented 1 month ago

I just got a vip error, do all the interfaces have to have the same interface name on all hosts? Cause what I have set is correct for the host I want for client connections. 192.168.77.77 ens160.

vitabaks commented 1 month ago

Usually, the vip_interface: "{{ ansible_default_ipv4.interface }}" value is enough for ansible to determine the interface name for each host. \ But if you explicitly specify the interface name in a variable, then yes, in that case, it should be the same on all servers.

In any case, try to keep the servers identical.

dtdionne commented 1 month ago

What's causing this?

After the playbook finishes i get this error...

[root@bibble postgresql_cluster]# ansible-playbook remove_cluster.yml Traceback (most recent call last): File "/usr/local/bin/ansible-playbook", line 5, in <module> from ansible.cli.playbook import main ModuleNotFoundError: No module named 'ansible'

dtdionne commented 1 month ago

So I think this is resolved and actually i dont know what caused it so my guess is someone goofed around with one of the systems. Or i guess it coulda been the firewall but idk. I saw where the config says it disables firewalld for rhel but my guess is my alma9 installs arent being detected as rhel. But again, idk...im just beginning to get my feet wet here.

Thanks for the patients and hard work.