HA for cluster_mgr - Githubissues

zettadb / cluster_mgr

Clust_mgr is an important compnent of KunlunBase. It provides a HTTP API for KunlunBase users to do cluster management, provisioning and monitor work, so that uses can install a cluster, a kunlun-server node, a storage shard or a kunlun-storage node by calling such APIs. Such capability enables users to integrate KunlunBase management and provisioning as part of their existing application or GUIs. Cluster_mgr also provide other important cluster maintenance background work to make sure the KunlunBase clusters it serves can work efficiently and reliably.

Apache License 2.0

10 stars 2 forks source link

Issue migrated from trac ticket # 360 www.kunlunbase.com

component: cluster manager | priority: major

2021-12-28 18:14:15: zhaowei@zettadb.com created the issue

cluster_mgr is currently running as a singleton, if cluster_mgr and its cluster_mgr_safe daemon go down at the same time, e.g. when the computer server it runs on powers off, then all its serving clusters can't work correctly.

So we need a high availability mechanism for cluster_mgr, so that serveral cluster_mgr processes form a cluster, with one as master. when the master is gone, other cluster_mgr instances will notice and elect a new master to work for kunlun clusters.

monitor service should alert such incidents.

zettadb / cluster_mgr

HA for cluster_mgr #52

2021-12-28 18:14:15: zhaowei@zettadb.com created the issue

2022-04-06 17:44:22: snow@zettadb.com