Closed afck closed 6 years ago
See also: https://github.com/amiller/HoneyBadgerBFT/issues/57 ("Bounded Badger") and https://github.com/amiller/HoneyBadgerBFT/issues/12 ("Asynchronous Ratchet")
Well, you asked for some ideas:
Isn't this a typical problem of the attacker being able to force the peer to allocate a resource with very little cost themselves? This is not without precedent, reminding me Slowloris or SYN floods.
Neither of these are a perfect match for our problem, but both are solved by forcing more work on the attacker, in the examples above by constraining the time window for sending data more strictly, or moving the memory burden back onto the attacker.
What about limiting the underlying transmission channel, e.g. adding a "queue full, please retry" response? In that case, the attacker would have to buffer the message themselves and is expected to retry transmitting it.
A sender could simply use a priority queue that sends out messages with the lowest epochs first, so this would only be an issue if two honest nodes have vastly diverged in epochs. These are parameters that can be tweaked though and we are not going to have infinite resources either way.
Maybe even normal TCP backpressure is enough as well; if messages from honest nodes are rarely out of order, we could just stop reading from the TCP connection for a while. This in turn will cause sending to stop or slow down on the other peer, naturally causing them to buffer.
Finally, we should probably define: How should the network behave if a single node is vastly inferior in computing or network capacity than the others? Imagine eight high-performance servers and a ninth node running on a slow machine behind tor. The network can either throttle itself, dropping down the performance enough to include this node, or mark it as faulty and run at a much higher speed.
I agree, I think there's no way around moving some memory burden to the sender. And to keep it simple, it may be best to move all of it: We already have the max_future_epochs
setting and in normal operation it should be unlikely that you receive messages from e.g. more than three epochs ahead. So maybe we should just reject all of those, and have the sender buffer them.
Then we can add (parts of) the Bounded Badger proposal. Essentially it probably comes down to threshold-signing each batch: once we have a signature—i.e. a proof—of batch e, we can drop all buffered outgoing messages of that epoch, because they can all be replaced by that signature and the batch itself.
In addition, we could report a lagging node (with a configurable number of epochs) as faulty, so the application logic can decide whether to vote it out.
I'm less sure about the agreement epochs, and whether it's even worth implementing a similar kind of checkpointing for them. Or whether to just limit it to a maximum of 120 epochs, like in Bounded Badger.
Whether the buffer is on the sender or on the recepient doesn't change much and in fact only changes the point of view. If the buffer moves to the sender then the following attack is possible which is a dual to the attack by sending lots of messages: A recepient attacker can delay messages from a sender indefinitely, leading to the sender having to buffer the messages to the attacker indefinitely. The difference of placing the buffer on the sender instead of the recepient - as long as this attack is concerned - is that the sender would have to have a bound on resending messages to avoid a memory leak.
So if there is a fix involving buffering on the sender then there should be a dual fix that uses buffering on the recepient.
It's interesting to contemplate @afck's idea to sign batches rather than blocks. The Bounded Badger proposal referred to blocks which sounds to require a blockchain. However it is possible that "block" was used to mean "batch" there, in which case there is no difference. In principal, pairs of a batch and a signature might not even be ordered, so that a lagging node might learn a batch of epoch r
before it learns another batch of epoch r' < r
. In any case, if batches are signed with threshold signatures then any lagging node can catch up if it receives a verifiable batch.
Do you think the batches should be learned in the order of their creation? Or would it be enough to learn without a specific order, possibly requiring the lagging node to verify a chain of signed batches that it has missed as soon as the chain is complete?
I think it's not quite symmetric: The sender knows that its own messages are not faulty. The recipient can only verify that once it has progressed to the corresponding epoch. Also, the sender can drop messages for an epoch from the outgoing queue once it has a signed batch for that epoch.
I think the batches need to be sent in the correct order. With the above proposal, the recipient would reject a batch for any later epoch, so the sender would have to keep it in the queue anyway.
I wonder whether we should even avoid threshold signatures, and just use f + 1 individual signatures? That would make the message slightly larger, but that shouldn't be much overhead in practice (it contains a whole block/batch, after all), and threshold signatures are probably a lot slower than individual signatures.
Actually, that is another issue I have with the proposal: Honey Badger goes to great lengths (using erasure coding) to avoid very large messages — no message ever contains a whole batch, only individual contributions. With that signed message, we would not only introduce a very large message variant, but, if implemented naively, also send that message multiple times. We'd at least need to make sure that it is only sent on request. But I'd like it even better if we could avoid a very large message entirely, and instead use erasure coding here, too: you wouldn't try to collect signatures for a finished epoch at all. Instead, you would send only your own signature, together with a Merkle proof and a Reed-Solomon shard of the batch, with threshold N - f?
Even with Reed-Solomon, this is still wasteful if we do it as soon as we move to a new epoch. Then the first node to finish would always send a redundant message of size B / (N - f) to everyone else. On the other hand, if we wait longer we make it even harder for a lagging node to catch up.
Another potential waste of bandwidth with the above approach is that the first message in each epoch is often our own proposed contribution, which is large. We should avoid repeatedly sending that message just to have the recipient reject it. Instead, each node should tell everyone else whenever they move to a new epoch, so no messages are ever sent too early. The drawback is that it introduces another round of messages. But for that we already have max_future_epochs
! If that is e.g. 3, then when I move to epoch 38
, I'll tell everyone that I'm ready to receive messages for epoch 42
from now on. That way, the additional message round happens long before I actually move to epoch 42
.
And in terms of implementation, I don't see a good way around some duplication: we'll have to implement this for Honey Badger and, maybe in a somewhat simpler form, for Dynamic Honey Badger
I think it's not quite symmetric: The sender knows that its own messages are not faulty. The recipient can only verify that once it has progressed to the corresponding epoch.
The sender has to send all of its (correct) outgoing messages and the receiver has to receive all correct incoming messages. You are right pointing out the asymmetry: the correctness check is implemented only on the receiver. However we cannot remove that check and it remains present in both cases, whether the buffering is done on the sender or on the receiver.
I wonder whether we should even avoid threshold signatures, and just use f + 1 individual signatures?
Do you mean growing a chain from 1 to f + 1 individual signatures by rebroadcasting the batch and appending one signature not already in the chain on every batch message rebroadcast? I think it is less efficient since the worst-case number of broadcasts for a perfect network is n * (f + 1)
compared to f + 1
in the case of threshold signatures. Although there can be other advantages of the former over the latter such as decreased computation costs.
The asymmetry is that the receiver has to cache all the messages it cannot handle yet, including the spam, because it doesn't know yet which of them are correct. The sender knows that all its outgoing messages are correct, and won't spam its own outgoing queue.
And if we send this IAmAtEpoch(5)
message in the other direction, we won't ever send a message that we would need to re-send: We know that the receiver can now handle messages from epochs 5, 6, …, 5 + max_future_epochs
.
Do you mean growing a chain
No, we wouldn't send a chain: just our own signature to the lagging node. The lagging node could then count the signatures it received, and when it has f + 1, it could consider the batch valid and move on to the next epoch.
(In the error code variant, it would actually have to wait for N - f messages before it could reconstruct the batch.)
This is a fair point about keeping all the messages. You mentioned the message which a sender uses to announce epoch increment. Where does a new validator start? Should it queue all messages to a node until it receives an IAmAtEpoch
from it? Or is the initial state provided in a welcome message?
About f + 1
signatures: the receiver would need to keep hold of the batches received in individually signed messages for equality checks because it needs f + 1
different signatures on the same batch. Depending on the size of the batch and the number f
that may or may not have a noticeable memory footprint.
Good point: I'm afraid the new validator would have to wait for everyone else's IAmAtEpoch
message, because the JoinPlan
can't really include that information.
On the other hand, IAmAtEpoch
(we need a better name for this…) would be multicast to everyone, including the observers, so the joining node could collect that information from the others before it starts as a validator.
I agree that duplicating the batch f + 1 times would be wasteful. That's why I'd be in favour of using Reed-Solomon instead.
And now that I think of it, a threshold of N - f wouldn't be good enough: A correct node could be lagging, and would only receive N - f - 1 batches. In fact, up to f correct nodes could lag. So we'd need a threshold of N - 2 f - 1 (or just f).
Definitely, we should try sending parts of the batch. Especially if those parts can be uniquely assembled into the batch. That would solve all the problems.
Let's also compare with the current Bounded Badger plan and make sure we're not overlooking anything.
With #308, the larger part of this has been done and DHB/QHB/HB's incoming queues are gone.
Meanwhile, the other half has become more complex: Agreement
now has more message types than it used to, and it's own SbvBroadcast
subalgorithm…
The solution should still be the same, though: Limit the total number of epochs to a value that is negligibly unlikely to reach, and limit the amount of memory used per epoch.
We need to ensure that an attacker can't fill up a node's memory by sending it lots of messages that it stores forever, particularly in the algorithms that proceed in epochs (agreement, Honey Badger, …) and that, to be fully asynchronous, would in theory need to store any information for any future epoch until they can process it.
It's tempting to enforce some reasonable limits instead, and just assume that we'll never fall behind by more than 1000 epochs. But then an attacker could break consensus by sending a message that is exactly 1000 epochs ahead and will be handled by only some of the honest nodes.
Either way, we should try to limit the amount of memory per epoch: E.g. in agreement,
incoming_queue
should be replaced by just the information which of the other nodes sent which of the four possible messages:BVal(true), BVal(false), Aux(true), Aux(false)
.