vitabaks / postgresql_cluster

PostgreSQL High-Availability Cluster (based on "Patroni" and DCS "etcd" or "consul"). Automating with Ansible.
MIT License
1.29k stars 352 forks source link

Using private network/ips & external load balancer #65

Closed valentin-fischer closed 3 years ago

valentin-fischer commented 3 years ago

Hello everyone,

Thank you for creating such a cool deployment playbook! I have a question related to using private network/ips and a load balancer in front of the nodes.

More exactly, I'm using terraform to create the following infrastructure on hcloud (hetzner cloud) for a postgres cluster.

3 servers with public & private ips. 1 load balancer with public & private ip.

Is there a way to use the private ips of these servers and the load balancer private ip to direct traffic towards the postgres cluster?

I want to avoid having the cluster any kind of public interfaces/traffic.

If there is, can someone guide me into the right direction?

Thank you!

br

vitabaks commented 3 years ago

Perhaps I did not understand your question ... but

You need to add your balancer host to the [balancers] group in the inventory. And the postgres hosts in the [master] and [replica] groups, in which you specify private IP addresses.

The specified ip addresses will be used to listen by the cluster components. And the haproxy balancer listens only to the cluster VIP (if the cluster_vip variable is specified). As far as I understand, it is cluster_vip that should be a public address. You must also specify the name of the interface with your public network (variable vip_interface)

P.S. I have not tested this playbook on hcloud (hetzner cloud).

valentin-fischer commented 3 years ago

Hi @vitabaks ,

Thank you for the fast reply. I will give it a try and let you know how it goes :)

br

valentin-fischer commented 3 years ago

Hi @vitabaks ,

So I tried to do a first deployment using private IP and it mostly worked. The issue is for now at the load balancer IP / cluster_ip.

RUNNING HANDLER [vip-manager : Wait for the cluster ip address (VIP) "10.10.10.100" is running] ***************************************************************************************************************************************************************************************************************************************************************************
ok: [10.10.10.7]
fatal: [10.10.10.9]: FAILED! => {"changed": false, "elapsed": 61, "msg": "Timeout when waiting for 10.10.10.100:22"}
fatal: [10.10.10.6]: FAILED! => {"changed": false, "elapsed": 61, "msg": "Timeout when waiting for 10.10.10.100:22"}

NO MORE HOSTS LEFT ********************************************************************************************************************************************************************************************************************************************************************************************************************************************************

PLAY RECAP ****************************************************************************************************************************************************************************************************************************************************************************************************************************************************************
10.10.10.100               : ok=0    changed=0    unreachable=0    failed=0    skipped=92   rescued=0    ignored=0
10.10.10.6                 : ok=102  changed=63   unreachable=0    failed=1    skipped=175  rescued=0    ignored=0
10.10.10.7                 : ok=103  changed=63   unreachable=0    failed=0    skipped=175  rescued=0    ignored=0
10.10.10.9                 : ok=104  changed=63   unreachable=0    failed=1    skipped=179  rescued=0    ignored=0
localhost                  : ok=0    changed=0    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0

If I login to the server that is failing, I can see the IP attached to it.

3: ens10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc fq_codel state UP group default qlen 1000
    link/ether 86:00:00:74:de:86 brd ff:ff:ff:ff:ff:ff
    inet 10.10.10.7/32 brd 10.10.10.7 scope global dynamic ens10
       valid_lft 85809sec preferred_lft 85809sec
    inet 10.10.10.100/24 scope global ens10
       valid_lft forever preferred_lft forever
    inet6 fe80::8400:ff:fe74:de86/64 scope link
       valid_lft forever preferred_lft forever

So ... do I need this cluster_vip address ? I was thinking that I have to configure the load balancer (the external one) and add the server IPs + some ports to it in order to have everything working.

Am I missing something?

What kind of ports do I need to use/add to the load balancer?

br

valentin-fischer commented 3 years ago

Seems like the following PR would add support to vip-manager for hetzner floating IPs -> https://github.com/cybertec-postgresql/vip-manager/pull/34

br

valentin-fischer commented 3 years ago

So, the deployment to private IPs is working as expected.

TASK [deploy-finish : PostgreSQL Cluster health] **************************************************************************************************************************************************************************************************************************************************************************************************************************
ok: [10.10.10.6] => {
    "patronictl_result.stdout_lines": [
        "+ Cluster: cloudgeek (6881628165851351380) ----+----+-----------+",
        "| Member | Host       | Role         | State   | TL | Lag in MB |",
        "+--------+------------+--------------+---------+----+-----------+",
        "| db0    | 10.10.10.6 | Leader       | running |  1 |           |",
        "| db1    | 10.10.10.9 | Replica      | running |  1 |         0 |",
        "| db2    | 10.10.10.7 | Sync Standby | running |  1 |         0 |",
        "+--------+------------+--------------+---------+----+-----------+"
    ]
}

TASK [deploy-finish : Get postgresql database list] ***********************************************************************************************************************************************************************************************************************************************************************************************************************
ok: [10.10.10.6 -> 10.10.10.6]

TASK [deploy-finish : PostgreSQL list of databases] ***********************************************************************************************************************************************************************************************************************************************************************************************************************
ok: [10.10.10.6] => {
    "dbs_result.stdout_lines": [
        "   name   |  owner   | encoding |   collate   |    ctype    |  size   | tablespace ",
        "----------+----------+----------+-------------+-------------+---------+------------",
        " postgres | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 7953 kB | pg_default",
        "(1 row)"
    ]
}

TASK [deploy-finish : Create list of nodes] *******************************************************************************************************************************************************************************************************************************************************************************************************************************
ok: [10.10.10.6]

TASK [deploy-finish : PostgreSQL Cluster connection info] *****************************************************************************************************************************************************************************************************************************************************************************************************************
ok: [10.10.10.6] => {
    "msg": [
        "+------------------------------------------------+",
        "address 10.10.10.6,10.10.10.9,10.10.10.7",
        "port 6432 (pgbouncer)",
        "+------------------------------------------------+"
    ]
}

PLAY RECAP ****************************************************************************************************************************************************************************************************************************************************************************************************************************************************************
10.10.10.6                 : ok=108  changed=61   unreachable=0    failed=0    skipped=196  rescued=0    ignored=0
10.10.10.7                 : ok=98   changed=60   unreachable=0    failed=0    skipped=186  rescued=0    ignored=0
10.10.10.9                 : ok=98   changed=60   unreachable=0    failed=0    skipped=186  rescued=0    ignored=0

Can some please tell/show me all the ports needed to connect via a load balancer to the targets?

e.g: load_balancer_ip:protocol:port -> target:port

Thank you!

vitabaks commented 3 years ago

cluster_vip is optional, and is needed to provide high availability for Haproxy (for scheme "Type A") using keepalived. If you are using 1 load balancer, then you do not need cluster_vip.

when you talk about load balancer, do you mean haproxy? or do you use some other tcp balancer?

valentin-fischer commented 3 years ago

Hi,

This is a tcp load balancer ( more info here -> https://www.hetzner.com/cloud/load-balancer ). I just need to know which ports should I add to the loadbalancer and to which server/target should I forward them.

br

vitabaks commented 3 years ago

I just need to know which ports should I add to the loadbalancer and to which server/target should I forward them.

See "PostgreSQL Cluster connection info". In your case, this is: 10.10.10.6:6432, 10.10.10.9:6432, 10.10.10.7:6432

But you need to set up health checks in REST API Patroni (port: 8008) to get the status about the server role. More info about Health check endpoints: https://github.com/zalando/patroni/blob/master/docs/rest_api.rst#health-check-endpoints

vitabaks commented 3 years ago

Seems like the following PR would add support to vip-manager for hetzner floating IPs -> cybertec-postgresql/vip-manager#34

Yes, you can use vip-manager if you don't need load balancing for reading. In this setup, you don't need a load balancer.

vip-manager is a service that gets started on all cluster nodes and connects to the DCS. If the local node owns the leader-key, vip-manager starts the configured VIP. In case of a failover, vip-manager removes the VIP on the old leader and the corresponding service on the new leader starts it there.

valentin-fischer commented 3 years ago

Thank you for the feedback/help. I will close this issue.

Great work with these playbooks!