jgroups-extras / jgroups-raft

Implementation of the RAFT consensus protocol in JGroups
https://jgroups-extras.github.io/jgroups-raft/
Apache License 2.0
266 stars 84 forks source link

Restarting node after membership change #245

Open jabolina opened 10 months ago

jabolina commented 10 months ago

The usual workflow for maintenance would be to execute a membership operation and, after it commits, shut down the node. After any maintenance or change, restart the node and add it again to the cluster with another membership operation. In this approach, the node maintains all the state and catches up with the leader after the restart.

The problem is that we restrict non-member nodes to start. During the RAFT start, we verify the current node's raft-id is included in the member list. In this "maintenance flow," we removed the node, meaning that the node recovers the state and is not included in the member list.

The removed node is unable to join the cluster after restart. The only solution for this issue requires manual intervention to delete all the node's data.

I need to investigate how to address this. Ideally, I would like to avoid deleting the node data on restart.