poanetwork / hbbft

An implementation of the paper "Honey Badger of BFT Protocols" in Rust. This is a modular library of consensus.
Other
357 stars 96 forks source link

Fixes the net_dynamic_hb test #372

Closed vkomenda closed 5 years ago

vkomenda commented 5 years ago

Fixes #369.

The fix mostly amounts to improving how removed nodes are handled in VirtualNet.

The sender queue algorithm now flags itself if it was removed. The user can check the flag is_removed to shut down the node.

vkomenda commented 5 years ago

I'm finished responding to comments. It appears some new functions in net_dynamic_hb weren't required. I refactored those out.

I am a bit concerned over the filter that filters messages for removed nodes, it seems to remove almost all usefulness from the error that gets thrown when messages are sent to non-existing nodes.

Only messages to removed nodes are filtered out. If the node is not removed using the provided method remove_node and yet disappears, a message to it still triggers an error in crank.

mbr commented 5 years ago

Now I started thinking that the messages should be removed earlier, at dispatch time. That way we can guarantee that no messages to the removed node arrive to it when it is finally readded. That would have been an undesirable side-effect.

I think it's problematic to add these automatic removal features to the simulated networking code, as this seems a bit like fixing the tests instead of fixing the code. We do have to decide what we want the VirtualNet to be, at the moment it is meant to be more strict than an actual network.

In a real networking environment, packets destined for non-existing nodes would either silently be dropped or be handled by an upper layer that provided the encrypted and reliable peer connections. Inside VirtualNet, any stray packet is considered an error though, because we aim to implement in a way that these guaranteed to be useless messages are not even generated.

Now there are some cases in which messages into the void are unavoidable, due to the fact that they are delivered and acted upon in essentially random order. This should be a special case though, which could be handled by simply adding a flag to virtual net that, when set, makes it okay for messages for non-existing nodes to be discarded. This could be turned on and off for short intervals. Keep in mind that we probably would still need to purge the queue of "stale" messages meant for the old one, if we are reusing the node ID, because in a real life situation those messages would simply be dropped due to failing signatures/connections by lower level layers.

At that point we have to question the merits of reusing the ID though. Furthermore, as @afck pointed out, due to changes in the algorithm model itself, the test would nowadays probably be better served by removing a (random) subset of validators and adding a random amount, instead of just a single one.

To summarize: I think adding all these special filters to make the code work for this specific test is adding a lot of complexity which bears little value. The only remaining value would be catching algorithms that generate packets for IDs that were never valid. We should either drop the error-on-nonexisting-ID feature altogether, or add a feature to turn it off or suspend it.

afck commented 5 years ago

I agree with all of @mbr's points. :heavy_plus_sign: But let's try to find a way to avoid turning this PR into a major tests/net extension, in addition to the test fix.

vkomenda commented 5 years ago

These are all very thoughtful comments, thanks @mbr and @afck.

I think it's problematic to add these automatic removal features to the simulated networking code, as this seems a bit like fixing the tests instead of fixing the code.

Fixing the node removal feature in VirtualNet is probably the most important part of this PR. Without it the test was struggling to fight off the ghost messages addressed to the removed node that were still sitting in the VirtualNet queue. I would stress that this feature based on a set of removed nodes is more selective than a flag might have been since it only allows DynamicHoneyBadger work with messages of the matching "lifetime", and does that without a major overhaul.