vitabaks / postgresql_cluster

PostgreSQL High-Availability Cluster (based on "Patroni" and DCS "etcd" or "consul"). Automating with Ansible.
MIT License
1.29k stars 352 forks source link

Failover Multi region #542

Open farisam opened 4 months ago

farisam commented 4 months ago

Hi @vitabaks ,

I have a question about failover with a secondary/replica multi regions. Using EC2, I plan to deploy four instances across two regions, with us-west-2 as the master and ap-east-1 as the replica.

Regarding failover speed, how quickly can we switch to the secondary? If we doing a manual failover, can we simply update the IP address in the inventory file (switch master to ap-east-1 and us-west-2 set to replica) and rerun ansible-playbook deploy_pgcluster.yml ?

Thank you

vitabaks commented 4 months ago

Hi @farisam

For switchover, you can use the patronictl command.

patronictl switchover <cluster_name>
vitabaks commented 4 months ago

Placement of cluster members in different data centers: If you’d prefer a cross-data center setup, where the replicating databases are located in different data centers, etcd member placement becomes critical.

There are quite a lot of things to consider if you want to create a really robust etcd cluster, but there is one rule: do not placing all etcd members in your primary data center. See some examples.

farisam commented 4 months ago

Hi @farisam

For switchover, you can use the patronictl command.

patronictl switchover <cluster_name>

Thanks for the answer @vitabaks .

1 more question,

If I have more than 1 replica, and I want to switchover/ failover to specific - cross region replica, can I use same command but targeted ip address?

vitabaks commented 4 months ago

Yes, you can use this command to switch regardless of the region.

I recommend reading the Patroni documentation

farisam commented 4 months ago

Thank you @vitabaks

I also curious about this one :

You said if we are using cloud

So we put 3 ip HAproxy address to Arecords weighted route53?

vitabaks commented 4 months ago

It seems to me that this is not a very good recommendation (I will update the comments later) because if one of the HAProxy servers fails, DNS will continue to direct traffic there.

farisam commented 4 months ago

Thank you @vitabaks . So for cloud we can't have virtual ip for HAproxy ya?

vitabaks commented 4 months ago

So for cloud we can't have virtual ip for HAproxy ya?

Traditional VIP configurations, such as those using keepalived, are generally not feasible in AWS due to its network architecture. AWS's cloud environment does not support the same level of network control required for setting up VIPs as in on-premises systems. Therefore, in AWS, it's more about impracticality rather than a strict impossibility, but the end result is that traditional methods like keepalived are typically not used. Alternative approaches, like using Amazon Route 53 for DNS failover, are recommended for high availability in AWS.

While I haven't personally set up this exact configuration, according to AWS documentation, it's possible to use Amazon Route 53 health checks. Route 53 can monitor the health of HAProxy or Patroni instances, automatically rerouting traffic from unhealthy instances to healthy ones. This approach could effectively replace the need for a VIP, ensuring high availability across regions or instances. For detailed setup instructions, the AWS Route 53 Documentation is a great resource: AWS Route 53 Health Checks and DNS Failover.

Another option to consider for high availability in AWS, especially if you're not looking to register a domain, is using Elastic Load Balancer (ELB). ELB can perform health checks on Patroni nodes, rerouting traffic to healthy ones if any instance fails. For detailed information on setting up and using ELB, you can refer to the AWS Elastic Load Balancing Documentation.

vitabaks commented 3 months ago

@farisam I am currently working on integration with cloud providers - PR https://github.com/vitabaks/postgresql_cluster/pull/464

And I added automatic creation of Amazon Elastic Load Balancer (ELB), here is an example: https://github.com/vitabaks/postgresql_cluster/pull/464#issuecomment-1912252310

I suggest we test it.

farisam commented 3 months ago

wow cool, let me test it @vitabaks . Thanks

vitabaks commented 2 months ago

@farisam How are you? Have you tested it?