project-iris / iris

Decentralized cloud messaging
iris.karalabe.com
Other
571 stars 32 forks source link

Scribe broadcast multiplied #16

Open karalabe opened 10 years ago

karalabe commented 10 years ago

There is a bug during churn, when a scribe node is joining while a publish is in flight. If the joining node gets a publish event, it will forward it towards the topic root (even if it has already been there and begun dissemination), leading to message multiplication.

The reason is that the current scribe implementation does not track whether a publish is searching for the topic multicast tree, or is already in the dissemination phase. Using two separate operations instead of a single opPub should solve this. It may lose messages (it will anyway), but it will not duplicate.

karalabe commented 10 years ago

Hmm, I was wrong, it actually did track whether a publish was virgin or not. The jury's still out :)

karalabe commented 10 years ago

Next hypothesis:

If the topic multicast tree is rearranging whilst publishes are in flight (i.e. a node is attached to a new parent), then events previously sent upstream might come back down on the new path. This should only occur during extreme churn or overload (heartbeats report remotes dead).

E.g. Topic T10, Nodes N1, N5, N10

In classical scribe this is somewhat alleviated by requiring each and every message to reach the topic rendez-vous point and begin distribution from there. Still, if a parent goes down the same issue can appear:

I'm not sure of the best solution here, one that might work is tagging each publish with the origin node and an event id; and maintaining at each node a list of messages seen in the last K secs... but that wastes quite a lot of resources.

karalabe commented 10 years ago

Small update: the bug was caused by scribe assuming the multi-cast tree is indeed a tree. Which in theory is, but in practice can temporarily become a graph while mutating. I've added a check in the publish distribution that after the event enters the scribe tree, only neighbors are allowed to pass between each other. This seems to have solved it.

Note, I'm leaving this open as the previous hypothesis still holds, and even though my fix solved the issue for all tests, there still exists an ordering of the events that could cause duplication (albeit quite small).