Closed kchristidis closed 8 years ago
A high level implementation could go like this 1) when a replica becomes aware of a request - for the first time - it broadcasts it to all other replicas (incl. the leader) and triggers a timer (for this particular request) 2) if that timer expires - with out request being committed - replica triggers view change
Per-request timer could be implemented using a sort of an ordered hashmap ordered by insertion time 1) When a replica receives the request it puts it in the ordered hashmap (ordered by insertion time, where keys are request hashes) 2) When you commit request you remove the request from the hashmap 3) periodically check the first entry (per insertion order) 3a) if its insertion time + TIMEOUT < current time trigger view change
PS: this sketch solution assumes that per request timers are the same for all requests which is not necessarily the case for deploy transactions but should/could be for invoke transactions. If different timeouts are needed, the sketch above needs to be tweaked a bit
PS2: DoS against rate limiting of requests (to address issue broght by @corecode) needs to be done, but perhaps as a separate mechanism/issue
PS3: It might be interesting to incentivize this mechanism (in future)
The whole crux is doing this with O(1) memory use. We did have a mechanism in place previously, but I disabled it in favor of not running out of memory.
How about this: every consenter can only challenge the primary with one single request.
However, a byzantine primary then may only process this one challenged request. This means that we should have another mechanism to change the primary when consensus throughput is slow.
@vukolic could you point us to academic literature that deals with these kinds of fairness?
@corecode " changing the primary when consensus throughput is low" is another issue which is not the same as fairness. Namely having "every consenter challenge the primary with one single request" does not solve the fairness issue. Some pointers for addressing "performance attacks" are at the end of this post.
As for fairness, it is typically done in the way I described above, yet only in case a client complains - as in - in the first submission of a transaction by a client - the above mechanism is ignored, but if the client complains - the above mechanism kicks in.
In order to limit DoS attacks that you mention above (and mentioned in PS2 of my previous comment), we have several approaches at hand: we can hard-limit the number of outstanding requests per client and make this a blockchain configuration parameter. This avoids DoS, unless the number of clients is extremely high - which is probably not going to be the case. In case of unbounded (very very large) number of clients (I do not think we are often going to be in that case) - one could apply the above mechanism probabilistically, which would still ensure fairness, but might mean that the client needs to resubmit more than once.
Practical approaches may also bound the number of outstanding clients by charging clients per transaction (out of scope in our case). This concept is, in a sense, behind the concept of Gas in Ethereum.
As for performance attacks - which, again, are a separate issue - literature that explains how to do this is often referred to as Robust BFT. Pointers
For a way to define fairness that this issue refers to see also:
Let's define "client" as submitting peer; this bounds the number of clients to the size of the consensus network. I think this is a reasonable assumption for now.
You mention that one solution would be "the client complains"; this is an active retry; this is what i meant by challenging the leader.
How would such a solution work in practice? An honest complaining client will inform all replicas of the request it wants to complain about. How does an honest replica receiving a complaint know that the complaining client indeed informed the whole network? Wouldn't the receiving replica have to forward the request to the whole network itself, or at least to the primary? It seems that this could be used to DoS the primary.
It may go as I mentioned above: when a replica becomes aware of a request - for the first time - it broadcasts it to all other replicas (incl. the leader) and triggers a timer (for this particular request)
this is in case a client sends the request (complaint) only to a single replica.
We could also offload to a client to send the complaint to all replicas, and, in this case, the job of a replica receiving the complaint would be only to forward this to the leader (we have to forward the request to leader as we cannot trigger view-change timers w/o giving the leader an opportunity to learn about the request)
The issue I see is that for a replica to believe that a request is a challenge request, it needs to send that request to the primary itself. This means that a byzantine replica can get a 1:N amplification by sending requests that are marked as "complaint".
To limit this impact, we could only allow a small number outstanding complaints per complaining replica, or a rate limit.
After thinking about it, I realize that there is no amplification attack.
We need to conceptually separate client, replica that was complained to, and the rest of the system. This applies to all 3 obcpbft consensus protocols, so we probably need a common clientcomplaint service that we can use, and a corresponding replica service that receives complains and reacts accordingly.
@corecode, could you update ?
Do you want to keep separate or combine with your work on #1000 ?
This can be closed.
The problem: VP1 submits a transaction to the leader but the leader won't add it to a new block. We need to figure out a way to initiate a view-change. (VP1's vote alone cannot bring about a view-change.)
The solution will involve VP1 broadcasting the transaction to the other VPs, and all the VPs keeping timers. But we need to be careful in how we kick off these timers (per-VP, instead of per-request timers will probably do the job). Had discussed this with @jyellick earlier this week, let's outline the solution during scrum.
Also, @corecode noted that we can't keep all pending requests in memory, cause we can be flooded with requests and run out of memory.