vernemq / plumtree

Epidemic Broadcast Trees
Apache License 2.0
6 stars 8 forks source link

Dont broadcast to unreachable nodes #15

Closed larshesel closed 6 years ago

larshesel commented 6 years ago

We shouldn't try to send out to the outstanding messages lazily if the receiving nodes are clearly not available (not in nodes()).

To test this out I had a three node cluster (three nodes are required to ensure that every node will have one node in it's lazy set), I killed two of them and on the remaining I would register 100K clients using vmq_reg:register_subscriber. That's the first spike on the below images. The X axis is time, the Y axis is the global scheduler load (using the VerneMQ scheduler_utilization metric).

plumtree_orig

The first graph shows the periodic resend causing CPU usage when ever the outstanding messages is lazily broadcast. This pattern will continue forever as the receiving node will never acknowledge the messages. The 4 (and the beginning of one) small spikes translate to roughly 50% cpu usage as reported by top.

plumtree_dont_send_to_unreachable

The second graph shows that there a no longer any spikes as we don't send out the outstanding messages to unreachable nodes.