Closed cmwaters closed 1 year ago
I did some research around this idea some time ago. Here is a paper on clock synchronization that provides a model that combines different fault models: process failures (Definitions 6 and 7), link failures (Definition 8). It then gives an algorithm in Fig. 1 that uses the different fault bounds. One then obtains constraints on the system size that look like:
n > 3*f_a + 2*f_s + 2*f_i + f_c
The analysis for consensus under these fault model seems comparable, and others have looked at this, but perhaps not in the context of partial synchrony (I am sure that there are paper on synchronous systems, though). I am happy to discuss this if people find this path interesting.
The most recent work that I know about with this mixed/hybrid model of failures is XFT:
https://www.usenix.org/conference/osdi16/technical-sessions/presentation/liu
Going through its related work, I could find UpRight, from which I think this concept of commision/omission comes:
And, apparently, the original concept of hybrid failures comes from this paper (didn't find available PDF):
https://www.computer.org/csdl/proceedings-article/reldis/1988/00025784/12OmNqJq4kR
Protocol Change Proposal
Currently, Tendermint consensus uses the BFT principle of
2f + 1
as the voting quorum needed to overcomef
"byzantine" processes. Given that omission is also a type of fault whereby quorum must ben - f
off
omission faults, quorum is thus measured as2f + 1
from3f + 1
total power or 2/3+. Thereby having 1/3 fault tolerance to errors of both omission and commission (byzantine)This, however, treats both of these fault methods as equal in value to the application. This however might not be the case. Some applications may value greater tolerance to omission faults and others, commission faults. I would like to put forth the idea of having a variable quorum either to be set as a one-off by the application or genesis or dynamically as a consensus param. Here are a few examples of variable tradeoffs an application could make with their exposure to modes of failure:
Thus, if
q
is the quorum (as a percent) selected by the applicationf_o
isn-q
andf_c
is2q - n
. (NOTE: there might be an off by 1 or a greater than vs great and equal than error in my calculation but I think overall this is correct).Just for further context on why
f_c < 2q - n
: Byzantine processes perform equivocation where they double sign for two different blocks and gossip their votes to two partitioned groups in the network. if there aref_c
byzantine processes andn - f_c
honest processes, then at most we can have two partitoned groups each of(n - f_c)/2
. To trick both groups into committing for two different states the quorum must be less than(n-f_c)/2 + f_c
(honest votes + byzantine votes) for each group. Doing some algebra you getf_c < 2q - n
For Admin Use