We shouldn't try to send out to the outstanding messages lazily if the receiving nodes are clearly not available (not in nodes()).
To test this out I had a three node cluster (three nodes are required to ensure that every node will have one node in it's lazy set), I killed two of them and on the remaining I would register 100K clients using vmq_reg:register_subscriber. That's the first spike on the below images. The X axis is time, the Y axis is the global scheduler load (using the VerneMQ scheduler_utilization metric).
The first graph shows the periodic resend causing CPU usage when ever the outstanding messages is lazily broadcast. This pattern will continue forever as the receiving node will never acknowledge the messages. The 4 (and the beginning of one) small spikes translate to roughly 50% cpu usage as reported by top.
The second graph shows that there a no longer any spikes as we don't send out the outstanding messages to unreachable nodes.
We shouldn't try to send out to the outstanding messages lazily if the receiving nodes are clearly not available (not in
nodes()
).To test this out I had a three node cluster (three nodes are required to ensure that every node will have one node in it's lazy set), I killed two of them and on the remaining I would register 100K clients using
vmq_reg:register_subscriber
. That's the first spike on the below images. The X axis is time, the Y axis is the global scheduler load (using the VerneMQ scheduler_utilization metric).The first graph shows the periodic resend causing CPU usage when ever the outstanding messages is lazily broadcast. This pattern will continue forever as the receiving node will never acknowledge the messages. The 4 (and the beginning of one) small spikes translate to roughly 50% cpu usage as reported by
top
.The second graph shows that there a no longer any spikes as we don't send out the outstanding messages to unreachable nodes.