initc3 / HoneyBadgerBFT-Python

The Honey Badger of BFT Protocols
Other
134 stars 65 forks source link

Accounting for buffer usage on per-peer basis #7

Open sbellem opened 6 years ago

sbellem commented 6 years ago

From @amiller on May 25, 2017 19:30

  1. The Asynchronous communication model means that outgoing messages may need to be stored/resent arbitrarily far into the future.

Some outgoing messages may be able to be marked as stale, where it no longer matters if they're re-sent. For example, once we get a final signature for a block, no messages pertaining to the previous round should matter.

How can we annotate the protocol to take advantage of this? What does this tell us about the maximum size of a buffer?

  1. Almost every communication in honey badger is "broadcast" only. The only exception is in Reliable Broadcast where different erasure coded shares are sent. Can the communication channel abstraction help with this?

  2. For incoming messages, Asynchronous model means messages pertaining to "stale" subprotocols that have since concluded might be able to be ignored. When can we safely mark a subprotocol as concluded and free up state? Can we express filter rules to efficiently discard old messages, e.g. messages pertaining to subprotocols in the previous round get discarded immediately?

Copied from original issue: amiller/HoneyBadgerBFT#4

sbellem commented 6 years ago

From @kc1212 on May 25, 2017 21:55

For incoming messages, Asynchronous model means messages pertaining to "stale" subprotocols that have since concluded might be able to be ignored. When can we safely mark a subprotocol as concluded and free up state? Can we express filter rules to efficiently discard old messages, e.g. messages pertaining to subprotocols in the previous round get discarded immediately?

I have implemented a modified version of HoneyBadgerBFT (I don't do the transaction buffering and use a dummy common coin for binary BA).

I maintain a single instance of the ACS class, it has a round number which can only be incremented. Every ACS message also has a round number attached to it.

At the beginning of an activation, everything is reset except the round number, this includes Bracha broadcast and binary BA objects. I believe this is safe as long as the node is certain that the previous round is completed.

I also ignore ACS message in two cases, (1) when the current round is already completed and (2) when a message of a lower round number is received.

In my experiments, I have seen cases where nodes go out of sync (round numbers don't match) but the protocol still behaved correctly. Hence, if I interpreted your question correctly, I believe the answer is "yes", at least empirically.

sbellem commented 6 years ago

From @amiller on May 29, 2017 15:31

Thanks for the note @kc1212 ! This is fantastic, I'm enjoying looking at your thesis code.

First of all, it seems a bit more concise to implement Binary Agreement your way. In your Mo14 / MMR14 implementation, you have one handler for every sequential "round" of the "bv_broadcast" subprotocol, the state for which is kept in dictionaries _est_values and _aux_values, whereas in my code there is an entire separate process (the bv_broadcast function) for each round.

It is also interesting how you explicit send back a Replay() message to indicate that the incoming has not been processed, but may be able to be processed if sent again at a later time. I'm having a think about both of these!

However, I'm not convinced yet that it's safe to switch to the "stopping" state as early as you have. It is possible that an honest node i could decide 0 in round r, but that another honest node j does not decide in that round. What the protocol ensures is that in round r+1, node j will start with est[r+1] = 0, and so it will decide 0 the next time that the coinflip returns 0. However, if node i stops responding to messages after r since it reaches the stopping state, node j may never terminate. This wouldn't show up in random testing... to cause this scenario to occur, you would need to build a scenario with Byzantine nodes equivocating. I thought about this scenario for a while and this is what I remember of it, anyway!

sbellem commented 6 years ago

From @kc1212 on May 29, 2017 18:22

The main reason that I implemented the way I did is because I used the Twisted framework. In the algorithmic description, a lot of the statements were like "wait for x number of messages to arrive and then do something". This is in direct contrast with Twisted and the event loop paradigm, where no operation is allowed to block. Thus I had to write these protocols more like a state machine.