Ringpop currently allows bootstrap to occur without a listening tchannel underneath. This was confirmed by using tick-cluster with bootstrap but not listen on both a single node, and all nodes (patch below).
Behaviorally, a single node failing to listen is the worst case. It has continuous 1-way interactions with other nodes, and seems to create a continuous cycle of other nodes marking it suspect. This is possible in real life during a rolling upgrade, or if bootstrap/listen handling is incorrect in some cases.
If all nodes fail to listen, they all simply fail to bootstrap, as expected.
Our current code demonstrates listen() before bootstrap() pretty consistently, but given the failure modes, we ought to be more defensive and confirm that the tchannel is already listening, or call r.channel.listen()ourselves.
Behavior was confirmed by watching tick-cluster logs, and running ringpop-admin dump on one of the live nodes.
(More or less duplicate of https://github.com/uber/ringpop-node/issues/275)
Ringpop currently allows bootstrap to occur without a listening tchannel underneath. This was confirmed by using tick-cluster with bootstrap but not listen on both a single node, and all nodes (patch below).
Behaviorally, a single node failing to listen is the worst case. It has continuous 1-way interactions with other nodes, and seems to create a continuous cycle of other nodes marking it suspect. This is possible in real life during a rolling upgrade, or if bootstrap/listen handling is incorrect in some cases.
If all nodes fail to listen, they all simply fail to bootstrap, as expected.
Our current code demonstrates
listen()
beforebootstrap()
pretty consistently, but given the failure modes, we ought to be more defensive and confirm that the tchannel is already listening, or callr.channel.listen()
ourselves.Behavior was confirmed by watching tick-cluster logs, and running ringpop-admin dump on one of the live nodes.