Clust_mgr is an important compnent of KunlunBase. It provides a HTTP API for KunlunBase users to do cluster management, provisioning and monitor work, so that uses can install a cluster, a kunlun-server node, a storage shard or a kunlun-storage node by calling such APIs. Such capability enables users to integrate KunlunBase management and provisioning as part of their existing application or GUIs. Cluster_mgr also provide other important cluster maintenance background work to make sure the KunlunBase clusters it serves can work efficiently and reliably.
2021-12-28 18:14:15: zhaowei@zettadb.com created the issue
cluster_mgr is currently running as a singleton, if cluster_mgr and its cluster_mgr_safe daemon go down at the same time, e.g. when the computer server it runs on powers off, then all its serving clusters can't work correctly.
So we need a high availability mechanism for cluster_mgr, so that serveral cluster_mgr processes form a cluster, with one as master. when the master is gone, other cluster_mgr instances will notice and elect a new master to work for kunlun clusters.
Issue migrated from trac ticket # 360 www.kunlunbase.com
component: cluster manager | priority: major
2021-12-28 18:14:15: zhaowei@zettadb.com created the issue