The usual workflow for maintenance would be to execute a membership operation and, after it commits, shut down the node.
After any maintenance or change, restart the node and add it again to the cluster with another membership operation.
In this approach, the node maintains all the state and catches up with the leader after the restart.
The problem is that we restrict non-member nodes to start. During the RAFT start, we verify the current node's raft-id is included in the member list. In this "maintenance flow," we removed the node, meaning that the node recovers the state and is not included in the member list.
The removed node is unable to join the cluster after restart. The only solution for this issue requires manual intervention to delete all the node's data.
I need to investigate how to address this. Ideally, I would like to avoid deleting the node data on restart.
The usual workflow for maintenance would be to execute a membership operation and, after it commits, shut down the node. After any maintenance or change, restart the node and add it again to the cluster with another membership operation. In this approach, the node maintains all the state and catches up with the leader after the restart.
The problem is that we restrict non-member nodes to start. During the
RAFT
start, we verify the current node'sraft-id
is included in the member list. In this "maintenance flow," we removed the node, meaning that the node recovers the state and is not included in the member list.The removed node is unable to join the cluster after restart. The only solution for this issue requires manual intervention to delete all the node's data.
I need to investigate how to address this. Ideally, I would like to avoid deleting the node data on restart.