vitabaks / postgresql_cluster

PostgreSQL High-Availability Cluster (based on "Patroni" and DCS "etcd" or "consul"). Automating with Ansible.
MIT License
1.29k stars 352 forks source link

Works on Ubuntu and CentOS but fails on OracleLinux #1

Closed orisvscs closed 4 years ago

orisvscs commented 4 years ago

Hi,

You have created great asset to help people deploy Postgres. Your project is the only one that deploys all (postgres, patroni, haproxy, keepalive, vip), and I am very thrilled to check it out. It deployed on Ubuntu 18.04, but failed on CentOS 7 and OracleLinux 7. Sent you an email with the logs, and it appears that etcd is failing to start due to firewall ports or other reason. Happy to chat with you via zoom/webex if needed.

vitabaks commented 4 years ago

TASK [etcd cluster | check etcd endpoints health] ***** fatal: [10.0.60.38]: FAILED! => {"changed": true, "cmd": "ETCDCTL_API=3 etcdctl endpoint health", "delta": "0:00:00.006746", "end": "2019-08-23 06:06:07.669512", "msg": "non-zero return code", "rc": 127, "start": "2019-08-23 06:06:07.662766", "stderr": "/bin/sh: etcdctl: command not found", "stderr_lines": ["/bin/sh: etcdctl: command not found"], "stdout": "", "stdout_lines": []}

Hi @orisvscs Thank you for your comment!

"stderr": "/bin/sh: etcdctl: command not found"

Please check if the etcdctl command is working on your server.

Check the etcd cluster status: etcdctl cluster-health

also check the etcd service logs: sudo journalctl -u etcd.service | tail -n 100 or sudo tail -n 100 -f / var / log / syslog | grep etcd

I think I should run additional tests for Oracle Linux.

vitabaks commented 4 years ago

This TASK [etcd cluster | check etcd endpoints health] is optional.

I performed a small update: https://github.com/vitabaks/postgresql_cluster/commit/ef2b5519a9eaa849e65a8c58c933faed65ad65f0

orisvscs commented 4 years ago

ok, tested on both Oracle Linux and CentOS and sent the error logs via email. Got not supported error on OracleLinux (earlier I didnt see that at all), and pip3 related error on CentOS.

vitabaks commented 4 years ago

Got not supported error on OracleLinux

First i must do additional testing for Oracle Linux, before I put it on the list.

TASK [Checking distribution] ******************************************************************************************************************
 FAILED! => {"changed": false, "msg": "OracleLinux is not supported"}

But you can try disabling the task "Checking distribution" in the deploy_pgcluster.yml file to run your tests.

Comment this: # - import_tasks: tasks/check_system.yml

vitabaks commented 4 years ago

and pip3 related error on CentOS.

TASK [Patroni | install setuptools] ***********************************************************************************************************
FAILED! => {"changed": false, "msg": "Unable to find any of pip3 to use.  pip needs to be installed."}

since all the VM's are being accessed as opc user (not root)]

Problems with the PATH environment variable for your user "ops". I used only the root user for deployment (ansible_user = 'root' in inventory file)

I will fix this problem soon. Thank!

vitabaks commented 4 years ago

Done. https://github.com/vitabaks/postgresql_cluster/commit/29de6a9827243b219b4b5e455fe3b5a0ddf8bdfa

Please download the playbook again and check on your CentOS 7.

orisvscs commented 4 years ago

Works with CentOS. thanks.

Now we need to ensure that it runs on Redhat or OracleLinux/OL (part of Redhat family). epel was an issue with OL, so hope you can look into it.

Looking good so far. Thanks. I need to learn how to use VIP from client machines too. Would be awesome if we can talk via zoom/webex. Possible for you?

vitabaks commented 4 years ago

Now we need to ensure that it runs on Redhat

This should work. I will be grateful if you take the time to test this playbook on RedHat.

OracleLinux/OL (part of Redhat family). epel was an issue with OL, so hope you can look into it.

Yes, I will do this soon. I do not closing your issue yet.

I need to learn how to use VIP from client machines too. Would be awesome if we can talk via zoom/webex. Possible for you?

On this issue I will contact you by email.

orisvscs commented 4 years ago

To make it work on OracleLinux, I had to do two things 1) comment this line in deploy_pgcluster.yml

After that, ansible tasks did it's job and no failures were reported.

Any possibility to avoid these manual steps?

vitabaks commented 4 years ago

Any possibility to avoid these manual steps?

Yes, I will do this soon. I do not closing your issue yet.

orisvscs commented 4 years ago

On your readme page, can you pls provide instructions on how one would use VIP when the VMs are deployed in a cloud? VIP is virtual ip, so how do clients access the postgres master and replica via the VIP? I think, it would help people unfamiliar with using VIP in the cloud.

vitabaks commented 4 years ago

IP is virtual ip, so how do clients access the postgres master and replica via the VIP?

Yes. Provides single entry point for databases access.

how one would use VIP when the VMs are deployed in a cloud?

The original design goal of this playbook was concerned with the initial deploiment of a PostgreSQL on physical servers (bare metal) or virtual machines on your own data center. I have not yet had experience with VMs in the cloud. So I'm not sure I can answer that question.

orisvscs commented 4 years ago

Can you share which VM's interface is assigned to the VIP in these files? templates/vip-manager.service.j2 -iface=\"${VIP_IFACE}\ templates/vip.conf.j2 VIP_IFACE="{{ vip_interface }}"

How do you get the values for VIP_IFACE and vip_interface?

If it's coming from the master node, what would happen when the master node goes down?

vitabaks commented 4 years ago

Can you share which VM's interface is assigned to the VIP in these files? templates/vip-manager.service.j2 -iface="${VIP_IFACE} templates/vip.conf.j2 VIP_IFACE="{{ vip_interface }}"

How do you get the values for VIP_IFACE and vip_interface?

https://github.com/vitabaks/postgresql_cluster/blob/master/vars/main.yml It is "vip_interface" variable.

Ansible automatically assigns your system interface. It is interface where the default route points to. You can specify the name of the desired interface yourself. Replace "{{ ansible_default_ipv4.interface }}" with the interface name. Example: vip_interface: "ens32"

what would happen when the master node goes down?

  1. If with_haproxy_load_balancing: 'false', the Type B scheme will be used and vip-manager will be installed. If the master node goes down, the VIP will move to a new master.

  2. If with_haproxy_load_balancing: 'true', the Type A scheme will be used and Keepalived will be installed. If the master node goes down, the VIP will move to one of the available servers with working HAProxy process.

orisvscs commented 4 years ago

This was very helpful.

I will wait for the fix for OL, and then we can close this issue.

vitabaks commented 4 years ago

I will wait for the fix for OL, and then we can close this issue.

Done. https://github.com/vitabaks/postgresql_cluster/commit/2e258918ed14cfd6ce3a52bbd7a7d113a472fc70 https://github.com/vitabaks/postgresql_cluster/commit/9648be6d9fe46ffc63831323a373e7cb24fe6ec9

Please download the playbook again and check on your Oracle Linux.

orisvscs commented 4 years ago

Worked, thank-you. You can close the issue now.