rkrzewski / akka-cluster-etcd

Akka cluster management using etcd
Other
70 stars 14 forks source link

Retry fetching seed list when none of the seeds seem to be responding #11

Open rkrzewski opened 9 years ago

rkrzewski commented 9 years ago

Currently a follower node fetches a list of seeds once and attempts to join the cluster using this address list. If the seed list out of date because of leader malfunction or cluster partitioning, this operation may "hang" indefinitely. Discovery actor should use a timer and if joining the cluster does not succeed within specified time it should cancel the ongoing joining process (invoking Cluster.joinSeedNodes(Seq()) does this) and re-fetch seed list from etcd, hoping that (a new) leader will eventually publish a correct list.

rkrzewski commented 9 years ago

Implemented in 7016917c15024268a88790472b37f9af57681d1a but tests are needed. That's a bit tricky since timeouts are involved.

rkrzewski commented 9 years ago

I've learned by further reading of ClusterCoreDemon source that Cluster.joinSeedNodes(Seq()) would be in fact ignored. However invoking joinSeedNodes with a non-empty list will interrupt current SeedNodeProcess, which means we can simply fetch another list and retry.