zephyr-im / zephyr

An institutional/enterprise-scale distributed real-time messaging and notification system
35 stars 11 forks source link

zhm may send HM_BOOT even with -N #88

Closed andersk-auto closed 10 years ago

andersk-auto commented 10 years ago

The hostmanager keeps track of a flag indicating whether it is "booting"; basically, this is true until the first time a notice is received from the currently-selected server. When everything is going well, this period is extremely short -- only until we get an ACK from our initial HM_BOOT or HM_ATTACH message. However, if the first server we try is nonresponsive, or all the servers are nonresponsive, it can last quite a bit longer.

If new_server() is called while we are still "booting", then a new server is selected and an HM_BOOT message is sent to that server (if not booting, HM_ATTACH is sent instead). This happens even if the -N switch was used, causing the initial HM_BOOT to be replaced with an HM_ATTACH.

There are several cases where new_server() can be called:

The first cannot trigger an HM_BOOT; control messages are processed only from the selected server, and the booting flag is cleared before the notice is processed. However, the other cases all can happen, and the race may be very easy to win if the server is slow to respond. Worse, it is guaranteed to happen if the first server is down, since switching to that server will cause an HM_BOOT to be sent to it.

I believe the -N option should cause the booting flag to be cleared immediately, with the effect that new_server always sends HM_ATTACH instead of HM_BOOT. Additionally, clearing of the deactivated flag should probably not be conditional on the booting flag.

andersk-auto commented 10 years ago

Imported from trac issue 88. Created by jhutz@CS.CMU.EDU on 2013-01-30T23:40:49, last modified: 2013-02-16T19:06:46

andersk-auto commented 10 years ago

Trac comment by kcr@ATHENA.MIT.EDU on 2013-02-16 19:06:46:

fixed in f8fd0932c84b1642a71f78ad72e87d1842a1d2ab