Open benoit-a opened 6 years ago
Ran into this issue too for the same exact reason, but the failure when re-deploying was at a different step:
TASK [etcd : Join Member | Add member to etcd cluster]
.
Since I had a 5 node cluster, it was possilbe to continue using the cluster by just keeping 3 nodes for etcd.
@TeddyAndrieux Is this issue still make sense in the context of Metal2.X? Is it related to https://github.com/scality/metalk8s/issues/2186?
We do not manage members the same way in MetalK8s2.x but we may take into account having an etcd node restoration with the same member name as an already existing member but with a different IP. (and more global one having a node restoration with same minion_id but different IP) It's not directly related to #2186 but to restoration in general (by the restore script btw).
E.g.: I'm not sure if we lose a bootstrap node and we try to restore from a backup on a new machine using the same minion_id/node_name as the previous bootstrap node (but with a different IP) if the restore goes well or not, TBC
One of my node (in kube-master / etcd groups) has failed and was replaced by a fresh new server with a new IP. The install is around 2 months old.
I took dev/0.1 and launch a playbooks/deploy.yml. The playbook failed at some "check etcd health" step.
As a last resort, I recreated the cluster from scratch.