TODO: Automate the scaling of DCS cluster nodes (etcd, consul)

vitabaks commented 1 year ago

Currently, scaling of Postgres nodes using playbook add_pgnode.yml and scaling of HAProxy nodes using playbook add_balancer.yml is implemented, but for etcd or consul cluster, only the initial deploument is automated and further maintenance needs to be done manually, for example, replacing a failed node or scaling a DCS cluster.

Automate the scaling of DCS cluster nodes

[ ] ETCD cluster
[ ] Consul cluster

Consider sponsoring the project via GitHub or Patreon

m3ki commented 1 year ago

Thank you for adding this to the enhancement list!

m3ki commented 1 year ago

I am not sure if this would help, but I believe I was able to replace a dead etcd node by doing the following:

In the inventory file:

mark the bad node with new_node

[etcd_cluster]  # recommendation: 3, or 5-7 nodes
10.10.10.77
10.10.10.78
10.10.10.79 new_node=true

on any non dead node remove existing node like so:

export ETCDCTL_API=3
HOST_1=10.10.10.77
HOST_2=10.10.10.78
HOST_3=10.10.10.79
ENDPOINTS=$HOST_1:2379,$HOST_2:2379,$HOST_3:2379

Get node dead node id

etcdctl --endpoints=$ENDPOINTS member list
etcdctl --endpoints=$ENDPOINTS endpoint health
etcdctl --write-out=table --endpoints=$ENDPOINTS endpoint status

Remove dead node

etcdctl member remove {NODE ID HERE}

readd new node

etcdctl member add ffs-node03 --peer-urls=http://10.10.10.79:2380

modify etcd.conf template

ETCD_INITIAL_CLUSTER_STATE="{{ 'existing' if new_node | default(false) | bool else 'new' }}"

Then rerun etcd playbook like this

ansible-playbook etcd_cluster.yml -i environments/staging/inventory --extra-vars "@environments/staging/main.yml"

vitabaks commented 1 year ago

@m3ki Thank you for your comment. I think some of the examples you provided will be used as a basis for further automation of the etcd cluster management process.

Now I had to modify ansible etcd playbook to change etcd user directory to something else of that of etcd data directory like so add/modify following after etcd data directory

Please tell me why it was necessary to do this.

m3ki commented 1 year ago

@m3ki Thank you for your comment. I think some of the examples you provided will be used as a basis for further automation of the etcd cluster management process.

Now I had to modify ansible etcd playbook to change etcd user directory to something else of that of etcd data directory like so add/modify following after etcd data directory

Please tell me why it was necessary to do this.

I had an issue starting the etcdserver on the new node, complaining that the directory already had files in it ie. .bashrc .profile etc... very odd since etcd is not a login user, and I haven't logged in into that user either. I'll test some more and report back.

m3ki commented 1 year ago

@vitabaks disregard changing of the home directory it seems to work fine. Just retested on my test cluster! I updated my comment above too.

vitabaks / autobase

TODO: Automate the scaling of DCS cluster nodes (etcd, consul) #503