dBFT 3.0 Specification - Githubissues

igormcoelho commented 4 years ago

Summary or problem description This issue is focused in discussing desired features for dBFT 3.0, and proposed specification for double-speaker proposals (dBFT3-WS2).

Do you have any solution you want to propose? We propose a novel consensus mechanism for Neo, named dBFT 3.0. It focuses on the ability to have double-speakers, which increases resilience against constantly faulted nodes. This can also be used to allow active/efficient participation from any node in the network, in limited slots, without any risk to damaging or delaying block production.

This is a follow-up from other issues, such as:

Neo Version

Neo 3

Where in the software does this update applies to?

Consensus

igormcoelho commented 4 years ago

We present some basic definitions here (list will be updated on time):

dBFT 3.0: proposed extension of dBFT 2.0 for Neo 3
dBFT3-WS2: proposed extension of dBFT 2.0 for Neo 3 including double speakers
dBFT 2.0+ and dBFT 3.0+: future possible extensions over dBFT 2.0 and 3.0, towards greater stability and preventing synchronous assumptions
Type-I attack: node appears to have intentionally or non-intentionally died (same as non responding node)
Type-II attack: node responds with different content to multiple nodes, contradicting specified protocol
honest node: a node that operates according to the specified protocol
byzantine node: a node which is not honest
2f+1: required set of honest nodes for the protocol to operate correctly, where f is the number of byzantine nodes

Some basic assumptions faced by Neo blockchain technology:

Mempool is non-deterministic and maintained by independent peer-to-peer nodes
No point-to-point connection exists between consensus nodes

erikzhang commented 4 years ago

Double speakers look good.

roman-khimov commented 4 years ago

@igormcoelho, it seems to be too quiet here... Let's warm this up and continue some discussions we've had via e-mail here. I've created #2057 and #2058 to describe NSPCC modifications to dBFT, so here we can concentrate on the specification linked above.

Double speakers

We have some concerns over this scheme reliability in real networks. It's possible for nodes to split on their P1/P2 decisions without reaching 2f+1 nodes for any of the two. This would be solved by view change (and fallback to single speaker), but that's what we're trying to avoid in the first place with this scheme. So will there be any practical improvements from using this scheme or not is still an open question.

And the main problem for us is that we see it as an optimization of existing algorithm, while we have much bigger problem to solve (below). Introducing this optimization without solving that problem first can introduce even more instability and I don't think we want that.

On the AMPL/MILP models provided

The code there tries to minimize/maximize some functions, so it finds some sequence of events that would do that, while instead we're expecting to see satisfiability proof: can something bad happen to the protocol or not depending on its model and given some constraints. These models might be useful for some experiments, but we'd really like to see results from using more appropriate tool (like TLA+) or a different model there.

DBFT 2.0 liveness lock

Still, these models show the problem that we've observed several times already, on page 9 it's described as liveness lock. And we think that this problem deserves way more attention, because it makes network fail completely requiring CN restarts to restore it. Solving that in reliable and provable manner seems to be more important to us than doing any kind of optimizations.

To summarize, what we'd really like to see in dBFT updates (irrespective of the version number) is:

solving liveness lock problem
proving that the algorithm works (and adjusting it if needed)
and only after that applying double speakers or any other non-trivial optimizations (of course updating the correctness proof along the way)

And we can do that, rolling out an updated dBFT is relatively easy for Neo, so solving these problems one by one with small incremental changes should be preferred instead of rolling out some huge update that solves (and breaks) everything.

vncoelho commented 4 years ago

Hi @roman-khimov, Igor had already replied to you by email, but I believe he will re-post it here. I will also add some comments.

First of all, regarding the MILP.

On the AMPL/MILP models provided

In this recent MILP model, we play with min-max optimization and also adding and removing constraints. Based on how strict the model becomes (and the bounds founded by the solver), we are able to define if some events can happen or not.

You are emphasizing TLA+, however, be aware that even the formal formulations of Liskov in 1999 on "Practical Byzantine Fault Tolerance" were later reconsidered and fixed for some specific cases in 2002 on "Practical Byzantine Fault Tolerance and Proactive Recovery".

We are dealing with a problem that can even be generalized as a graph problem, and ensure that all assumptions and events are covered is not something trivial to be done, it can even cross intersections with NP-Hard problems and the try to present a real mathematical proof may never be proofed to be really valid as well.

The MILP has its value if you take some time to understand its constraints and how delicate it is, any minor change can completely gives you clue about the behavior of the system.

Double speakers

You question is good and it is explained throughout our paper. There are conservative ways to do that (that are also related with keeping consistency across view, dBFT 2.0+). They are explained on Section 5.2.1 "Byzantine Attacks on dBFT 3.0-WS2: Towards dBFT 3.0+" of paper "Challenges of PBFT-Inspired Consensus for Blockchain and Enhancements over Neo dBFT" https://www.mdpi.com/1999-5903/12/8/129

There is an extra phase called Pre-Commit which avoids any split and can even be omitted when the network behaves as expected.

The paper "A MILP Model for a Byzantine Fault Tolerant Blockchain Consensus" has more details and a background on the model https://www.mdpi.com/1999-5903/12/11/185 for dBFT 2.0, while the .pdf has some summary of the specifications https://github.com/NeoResearch/dbft3-specification/blob/master/docs/dBFT3.0_Specification.pdf applied for dBFT 3.0 (a deeper look at the model for double speakers can be directly done here https://github.com/NeoResearch/milp_bft_failures_attacks/blob/master/dbft3.0/dbft3.0_2P.py).

DBFT 2.0 liveness lock

We agree about liveness problems and we are aware of them, as well as discussed and investigated solutions for that. Our vision is also aligned with what you are discussing in terms of consistency across views. Let's move this forward if the trade-off between network load by sharing information surpass the light model used nowadays.

In our paper we divided as dBFT 2.0, dBFT 2.0+, dBFT 3.0 and dBFT 3.0+. The "+" symbol was highlighted as enhanced versions that deal with those liveness problems.

vncoelho commented 3 years ago

We are starting a C# draft with this proposal and will send as PR in a couple of weeks/months.

neo-project / neo

dBFT 3.0 Specification #2029