zettadb / cluster_mgr

Clust_mgr is an important compnent of KunlunBase. It provides a HTTP API for KunlunBase users to do cluster management, provisioning and monitor work, so that uses can install a cluster, a kunlun-server node, a storage shard or a kunlun-storage node by calling such APIs. Such capability enables users to integrate KunlunBase management and provisioning as part of their existing application or GUIs. Cluster_mgr also provide other important cluster maintenance background work to make sure the KunlunBase clusters it serves can work efficiently and reliably.
http://www.kunlunbase.com
Apache License 2.0
10 stars 2 forks source link

HA for cluster_mgr #52

Open jd-zhang opened 2 years ago

jd-zhang commented 2 years ago

Issue migrated from trac ticket # 360 www.kunlunbase.com

component: cluster manager | priority: major

2021-12-28 18:14:15: zhaowei@zettadb.com created the issue


cluster_mgr is currently running as a singleton, if cluster_mgr and its cluster_mgr_safe daemon go down at the same time, e.g. when the computer server it runs on powers off, then all its serving clusters can't work correctly.

So we need a high availability mechanism for cluster_mgr, so that serveral cluster_mgr processes form a cluster, with one as master. when the master is gone, other cluster_mgr instances will notice and elect a new master to work for kunlun clusters.

monitor service should alert such incidents.

jd-zhang commented 2 years ago

2022-04-06 17:44:22: snow@zettadb.com