High availability for PoWA repository server

powa-team / powa

PostgreSQL Workload Analyzer

http://powa.readthedocs.io/

PostgreSQL License

766 stars 57 forks source link

High availability for PoWA repository server #182

Open hrawulwa opened 1 month ago

hrawulwa commented 1 month ago

I have a remote setup, with the DB repository, powa-collector (1.2.0) and powa-web (4.1.4) all running on the same server, I need to configure Active-Passive setup for the database. My question is when the database fails over or switches over to another node for any maintenance activity, how do I ensure that the collector and web recognize that the Primary is now running on another node, and start up. I understand that I need to install and compile collector and web on the Standby node, so that it is ready whenever this node becomes Primary.

Thanks Hari

rjuju commented 1 month ago

hi,

this is not really specific to powa, any service relying on a database would have the same problem. in general the correct way to handle that is to have a good ha tooling, which can handle multiple resources with things like resource collocation, so that the ha tool is responsible for starting the collector and the UI services on the active node. corosync / pacemaker with PAF for postgres can achieve that. alternatively, some other tools like patroni simply exposes some api to get the uri of the active node (or to configure haproxy or similar so that you always connect to the same address), and then you simply configure services, like powa collector, on a different server to connect to this uri. of course you then have to find a way to make those services highly available since patroni won't handle it, but since they're stateless it's supposed to be easier.

hrawulwa commented 1 month ago

I'm planning to place a script (to run every minute) on both the nodes, which will get the status of cluster ("in production" or "in archive recovery") using pg_controldata. If production, then startup the powa services if not already running. Similarly, if "in archive recovery", stop the powa services if already running. But there is another problem. When accessing powa-web, we need to provide the Primary server name in the URL. How do I achieve this? In the current configuration without any HA, I always use the same server in the URL.

Thanks Hari

rjuju commented 1 month ago

That's a question to ask to the HA solution you're using. With something like corosync/pacemake you would setup a virtual IP resource for which you would add a colocation constraint on the primary postgres node, so you would have a dedicated IP that is guaranteed to point to the primary node.

In your case, I don't know what tool you use for the postgres HA, and I don't know how other tools address that need anyway so I unfortunately can't give more detail.

hrawulwa commented 1 month ago

I use repmgr as the HA tool. It takes care of automatic failover, and during planned switchovers. I'm good with the DB part. If you are aware of any of users who implemented HA solution for the database with the similar setup I have (DB, collector and web all on one server), let me know how it was done. As you mentioned, I will do some digging on my end.

rjuju commented 1 month ago

I'm not aware of such setup, but I'm assuming that many people solved this problem already.

Looking at repmgr, it seems that their approach is to let you call a script in case of failover, and it's up to you to write all the necessary actions to go along with that action, including moving a vIP to the new primary node. There should be at least a couple of blog entries documenting that on the internet.