Open igormcoelho opened 4 years ago
We present some basic definitions here (list will be updated on time):
Some basic assumptions faced by Neo blockchain technology:
Double speakers look good.
@igormcoelho, it seems to be too quiet here... Let's warm this up and continue some discussions we've had via e-mail here. I've created #2057 and #2058 to describe NSPCC modifications to dBFT, so here we can concentrate on the specification linked above.
Double speakers
We have some concerns over this scheme reliability in real networks. It's possible for nodes to split on their P1/P2 decisions without reaching 2f+1 nodes for any of the two. This would be solved by view change (and fallback to single speaker), but that's what we're trying to avoid in the first place with this scheme. So will there be any practical improvements from using this scheme or not is still an open question.
And the main problem for us is that we see it as an optimization of existing algorithm, while we have much bigger problem to solve (below). Introducing this optimization without solving that problem first can introduce even more instability and I don't think we want that.
On the AMPL/MILP models provided
The code there tries to minimize/maximize some functions, so it finds some sequence of events that would do that, while instead we're expecting to see satisfiability proof: can something bad happen to the protocol or not depending on its model and given some constraints. These models might be useful for some experiments, but we'd really like to see results from using more appropriate tool (like TLA+) or a different model there.
DBFT 2.0 liveness lock
Still, these models show the problem that we've observed several times already, on page 9 it's described as liveness lock. And we think that this problem deserves way more attention, because it makes network fail completely requiring CN restarts to restore it. Solving that in reliable and provable manner seems to be more important to us than doing any kind of optimizations.
To summarize, what we'd really like to see in dBFT updates (irrespective of the version number) is:
And we can do that, rolling out an updated dBFT is relatively easy for Neo, so solving these problems one by one with small incremental changes should be preferred instead of rolling out some huge update that solves (and breaks) everything.
Hi @roman-khimov, Igor had already replied to you by email, but I believe he will re-post it here. I will also add some comments.
First of all, regarding the MILP.
On the AMPL/MILP models provided
In this recent MILP model, we play with min-max optimization and also adding and removing constraints. Based on how strict the model becomes (and the bounds founded by the solver), we are able to define if some events can happen or not.
You are emphasizing TLA+, however, be aware that even the formal formulations of Liskov in 1999 on "Practical Byzantine Fault Tolerance" were later reconsidered and fixed for some specific cases in 2002 on "Practical Byzantine Fault Tolerance and Proactive Recovery".
We are dealing with a problem that can even be generalized as a graph problem, and ensure that all assumptions and events are covered is not something trivial to be done, it can even cross intersections with NP-Hard problems and the try to present a real mathematical proof may never be proofed to be really valid as well.
The MILP has its value if you take some time to understand its constraints and how delicate it is, any minor change can completely gives you clue about the behavior of the system.
Double speakers
You question is good and it is explained throughout our paper. There are conservative ways to do that (that are also related with keeping consistency across view, dBFT 2.0+). They are explained on Section 5.2.1 "Byzantine Attacks on dBFT 3.0-WS2: Towards dBFT 3.0+" of paper "Challenges of PBFT-Inspired Consensus for Blockchain and Enhancements over Neo dBFT" https://www.mdpi.com/1999-5903/12/8/129
There is an extra phase called Pre-Commit
which avoids any split and can even be omitted when the network behaves as expected.
The paper "A MILP Model for a Byzantine Fault Tolerant Blockchain Consensus" has more details and a background on the model https://www.mdpi.com/1999-5903/12/11/185 for dBFT 2.0, while the .pdf has some summary of the specifications https://github.com/NeoResearch/dbft3-specification/blob/master/docs/dBFT3.0_Specification.pdf applied for dBFT 3.0 (a deeper look at the model for double speakers can be directly done here https://github.com/NeoResearch/milp_bft_failures_attacks/blob/master/dbft3.0/dbft3.0_2P.py).
DBFT 2.0 liveness lock
We agree about liveness problems and we are aware of them, as well as discussed and investigated solutions for that. Our vision is also aligned with what you are discussing in terms of consistency across views. Let's move this forward if the trade-off between network load by sharing information surpass the light model used nowadays.
In our paper we divided as dBFT 2.0, dBFT 2.0+, dBFT 3.0 and dBFT 3.0+. The "+" symbol was highlighted as enhanced versions that deal with those liveness problems.
We are starting a C# draft with this proposal and will send as PR in a couple of weeks/months.
Summary or problem description This issue is focused in discussing desired features for dBFT 3.0, and proposed specification for double-speaker proposals (dBFT3-WS2).
Do you have any solution you want to propose? We propose a novel consensus mechanism for Neo, named dBFT 3.0. It focuses on the ability to have double-speakers, which increases resilience against constantly faulted nodes. This can also be used to allow active/efficient participation from any node in the network, in limited slots, without any risk to damaging or delaying block production.
Related documents: https://github.com/NeoResearch/dbft3-specification
This is a follow-up from other issues, such as:
Neo Version
Where in the software does this update applies to?