zk1931 / jzab

ZooKeeper Atomic Broadcast in Java
http://zk1931.github.io/jzab/master/
Apache License 2.0
54 stars 23 forks source link

Dynamic configuration for javazab #97

Closed EasonLiao closed 10 years ago

EasonLiao commented 10 years ago

Hi Zookeeper gurus,

I've a lot of confusions on dynamic configuration, part of the reason is that I don't have much knowledge of Zookeeper implementation, also the implementations of javazab and Zookeeper are not exactly the same.

To make sure I can implement it correctly, I'll post all my questions and thoughts about reconfiguration here and I really need discussions like #17 to help me out. @fpj @fengjingchao @m1ch1

EasonLiao commented 10 years ago

First let's all agree on the following facts/assumptions:

EasonLiao commented 10 years ago

First case : let's say the old config is {A, B, C, D} and A is the leader, we want to add one server E, so the new config is {A, B, C, D, E}. A got the "join request" from E and it starts synchronizing E and appends COP at the end of stream. A, B, C, D got the COP and acknowledged the COP. Since {A, B, C, D} is the quorum in both new config and old config, so A sends out ACTIVATE and A, B, C, D got ACTIVATE and after that A failed, at this time, E might get nothing, its log and current config file and proposed file are empty.

         LOG              Current config                           Proposed config
A : ..., <COP> (current config : {A, B, C, D, E}, proposed config : {A, B, C, D, E})  FAILED
B : ..., <COP> (current config : {A, B, C, D, E}, proposed config : {A, B, C, D, E})
C : ..., <COP> (current config : {A, B, C, D, E}, proposed config : {A, B, C, D, E})
D : ..., <COP> (current config : {A, B, C, D, E}, proposed config : {A, B, C, D, E})
E : empty (current config : empty, proposed config : empty)

Now B, C, D forms a quorum in new configuration, they keep working and appending new proposals and committing proposals.

Question 1 : How does server E join new configuration? Will it go back to leader election? For now E doesn't belong to any configuration.

----ADD---- I guess when E first time tries to join the cluster, the servers in the cluster will tell the current configuration to E. E will update current configuration to {A, B, C, D} first, when timeout, server will always go back to leader election of current configuration, in this case, E will go back to leader election in {A, B, C, D}, so E will be synchronized finally.

EasonLiao commented 10 years ago

Second case : Similar to first case, but we add 2 servers {E, F} at a time. But this time E is the only one who gets ACTIVATE.

      LOG              Current config                           Proposed config
A : ..., <COP> (current config : {A, B, C, D}, proposed config : {A, B, C, D, E, F})  FAILED
B : ..., <COP> (current config : {A, B, C, D}, proposed config : {A, B, C, D, E, F})
C : ...,       (current config : {A, B, C, D}, proposed config : {A, B, C, D})
D : ..., <COP> (current config : {A, B, C, D}, proposed config : {A, B, C, D, E, F})
E : ..., <COP> (current config : {A, B, C, D, E, F}, proposed config : {A, B, C, D, E, F})
F : empty (current config : {A, B, C, D}, proposed config : empty)

E gets ACTIVATE since COP is both in a quorum of old config (A, B, D got COP) and new config (A, B, D, E got COP). Now E goes back to leader election of config {A, B, C, D, E, F} and B,C,D,F will go back to leader election of old config {A, B, C, D}. And B, C, D form a quorum in old config and B becomes the leader. After synchronization or just synchronized from the follower who has the "best" history, B can't move on since it sees pending COP(A, B, C, D, E, F) and {B, C, D} is not a quorum in new config. And they don't know whether the new config is committed or not, if it's committed, they just need to update its current config file and go back to leader election of new config, otherwise, they need to finish the transition to new config.

My question is how do B, C, D, F find out whether the pending config is committed or not?

ghost commented 10 years ago

In case of a primary failure, B (the new primary candidate) must follow the start-up phase of both the old and new configurations.

ghost commented 10 years ago

dupe of #98