lightem90 / testMRMW

MRMW for deployment on test system
0 stars 0 forks source link

LE doesn't start on the node that creates quorum when connecting #8

Closed xit4 closed 9 years ago

xit4 commented 9 years ago

As shown in the picture below, given 10 nodes, when node 6 connects a quorum is created so nodes 1-5 start a LE and elect 6 as their leader. 6 does NOT start the election until another node joins the system.

this is because election is started only after receiving end_handshake (so only on "old nodes") as seen at https://github.com/lightem90/testMRMW/blob/2cbf5ba6410e180a18aba6d2ab93691289f0dcdf/src/NetworkPrimitives/ConnectionManager.java#L241

Proposed solutions:

if (n.getFD().getActiveNodes().size() >= n.getMySett().getQuorum() && n.getFD().getLeader_id() == -1)
    startElectionRoutine();

outside of handleInit(), probably in the main cycle in run()

xit4 commented 9 years ago

temporarily fixed by using the first solution proposed in OP I'll keep this issue open until issue #9 isn't fixed so that we can be 100% sure election works and this issue is not present anymore.

lightem90 commented 9 years ago

I think that we are wrong when handling a new connection. No problem if we call accept (maybe) but when we call finishConnect we should update the view in the right way and query the system, in this way every server up will have the updated view of every node every time a node connects to another one. (We can't do it now because read blocks). The sense of the "init" procedure was to make sure that every node had the updated view of every other node to correctly implement the leader election

xit4 commented 9 years ago

leader election works properly now, if you don't consider issue #9