How to sign the stateroot？

Tommo-L commented 4 years ago

Q1: Should stateroot be part of core or plugin?

As we discussed at last night meeting, most people agreed that It should be part of the core to ensure the data consistency of the all cn nodes.

Q2: Which method should we use to sign state root?

This is still under discussion, there are some options:

Option A: Using the consensus message, binding stateroot and proposal block to complete the signature.
Option B: The cn node just broadcasts its own stateroot signature directly.
Option C: Using the stateroot contract, the consensus node which send the proposal block, add a stateroot transaction in the proposal block, just like the MinerTransaction in neo2.x
Option D: Add stateroot in header

What do you think?

Tommo-L commented 4 years ago

For me, I think option A is the best way at present, even through don't like it. Consider that if stateroot shuold be provided to the outside, like for SPV, cross-chain, it should be verifiable, requiring the signatures of all the consensus nodes.

shargon commented 4 years ago

They can send the current state root in the prepare request, it will be signed automatically. But then there is no difference with include it in the header.

ZhangTao1596 commented 4 years ago

They can send the current state root in the prepare request, it will be signed automatically. But then there is no difference with include it in the header.

They can sign header and stateroot separately.

roman-khimov commented 4 years ago

it should be verifiable, requiring the signatures of all the consensus nodes.

That's exactly why it should be in the header, because that's what CNs naturally agree upon and sign. Also, speaking of outside users, any P2P-distributed solution would require those users implementing P2P protocol support which may not be something they want to do.

And to me that's also very important from the other perspective, ensuring user's trust in the network. This whole state is something that matters to users, making header contain state hash gives clear guarantee that whatever this state is it won't suddenly change. If there is a need to change the state because of some bug or whatever then is should be done in a clear and explicit manner in subsequent blocks (like described in #1287), but not by rewriting the history.

So, Option D, definitely.

shargon commented 4 years ago

My favorite version is send the StateRoot of the previous block in the proposal, and include it in the header.

Tommo-L commented 4 years ago

Store in block footer?

|  header: version, hash, ... |
|  body: txs |
|  footer: stateRoot |

All consensus nodes sign the whole block, and set block.hash = header.hash.

ZhangTao1596 commented 4 years ago

We should consider of vm upgrade compatiblity first before decide where to put root. For example, if we accept the versioning vm proposaled by @shargon . Same height will get the same result no matter which version the node is. We can put root anywhere without inconvenience.

roman-khimov commented 4 years ago

My favorite version is send the StateRoot of the previous block in the proposal, and include it in the header.

I've been thinking about it for a while and although it has the obvious downside of state lagging one block behind, it still gives us clear predictable behaviour: for transaction in block N you get a state at block N + 1. And it fits our current transaction handling model nicely, it's very easy to implement. So it probably really is the best option we have at the moment.

Store in block footer?

Wouldn't that complicate life for light nodes/wallets? I think it would be nice for them to be able to operate just using headers. Also, stateRoot should probably be added to MerkleRoot calculation then (the same way ConsensusData is now), so I'm not sure what this move to footer would buy us.

For example, if we accept the versioning vm proposaled by @shargon . Same height will get the same result no matter which version the node is.

This versioning is essential IMO, but not just for VM, we should also consider whole contract execution environment like native contracts or syscalls (that might also be changed in some incompatible way for whatever reasons).

Tommo-L commented 4 years ago

Description Previously, we store stateRoot in p2p message for two main reasons:

If we separate state persistence from block persistence, the consensus node can quickly process transactions without waiting for state writes and smart contract execution. original posted at https://github.com/neo-project/neo/issues/302#issue-339345310 (If we store the previous block's state root in block header, it also may not have the problem)
As shown in the figure below, if a bug in the VM caused an incorrect calculation of 1+1=3, we want to fix it.

For the above problem, there are 4 ways to deal with:

Solution1

Ignore. For the future, we will correct it.

Affect

Layer1: neo-core need to support different neovm version and syscalls
Layer2+: do nothing
"Victim": have to face reality

Applicable scope:

Tiny bug
Tolerable impact
"Victim" < layer2+

Solution2

Affect

Layer1: neo-core need to support different neovm version and syscalls
Layer2+: They all need to rollback, It may also cause double spend or loss of assets on layer2+, like dex, exchange, lease, AMM, etc.
"Victim": It'll be happy

Applicable scope:

Serious problem
"Victim" > layer2+

Solution3

Roll back the block and re-execute the old block with the new version of vm.

Affect

Layer1: neo-core need to support different neovm version and syscalls, but it may has an additional negative effect.
- Transactions that failed in the past may be executed successfully.
- Transactions that successed in the past may be executed failure.
- Some related variables will also change.
Layer2+: They all need to re-check the status of their transactions, their account balance, and dAPP, etc. It may lead to new problems, such as double spend or loss of assets on layer2+, like dex, exchange, lease, AMM, etc.
"Victim": It'll be happy

Applicable scope:

Serious problem
Secondary impact is relatively small
"Victim" > layer2+

Solution4

Affect

Layer1: Just use the latest node for recalculation and start from the first block. But it also may has an additional negative effect.
- Transactions that failed in the past may be executed successfully.
- Transactions that successed in the past may be executed failure.
- Some related variables will also change.
Layer2+: They all need to re-check the status of their transactions, their account balance, and dAPP, etc. It may lead to new problems, such as double spend or loss of assets on layer2+, like dex, exchange, lease, AMM, etc.
"Victim": It'll be happy

Applicable scope:

Serious problem
"Victim" > layer2+

Summary

Solution1,2,3 are easy to accept in some cases.
Solution1, Solution2, Solution3 need neo-core to support different neovm version and syscalls.
Solution3, Solution4 may have additional effect, and it's a little uncertain and unpredictable to users.

I think that we put state root in block header will better than p2p message, which will benifit upgrade. What do you think?

shargon commented 4 years ago

I vote for Solution 3

vncoelho commented 4 years ago

We discussed this before,@igormcoelho emphasized that failed should not be converted. Perhaps we need a Solution 4,which is 3 with restrictions. I will talk to Igor and remember the past discussions.

roman-khimov commented 4 years ago

Oh, it'd be so easy to discuss that in one room with a whiteboard, but anyway.

My take on solutions 3 and 4 is that they're dangerous as they would change the state for old blocks which I think is not acceptable for several reasons:

it breaks state root, recalculated state is going to be different
it's not cross-chain friendly because if the other blockchain is to refer to some state root and we're to change it that link is going to be lost
probably the most important is that it breaks user's trust in the system, as a user I expect the state of the system to remain consistent no matter how the software changes, all changes to the state must be explicit and traceable

Solution 2 basically has the same issues, but it also directly contradicts one of the core Neo features which is one block finality.

And I don't think that we're only left with solution №1, we should take it and amend with one important addition from #1287 which is corrective transactions to get a solution number 5.

Solution 5 It would look something like this (sorry, it would take me like a week to redraw it following your style): corrective-tx-flow

It basically follows solution number one in that we're releasing VM version 2 and all blocks starting with some number N get processed using this new VM. But at the same time if we know that the bug fixed in VM version 2 has caused some wrong states we explicitly correct them with another corrective transaction in one of subsequent blocks. So if accounts A1 and A2 have wrong balances we adjust them by this transaction.

Obviously it's a very powerful (and dangerous) mechanism, so this transaction would be signed by CNs or even whole governance committee (now that we have it). And it would only be applied when there is a need to, if some wrong cryptocuties were generated because of this bug we may as well decide that they're way too cute to change them (and thus they just won't be corrected). But if there is a change in the state, the user would know why it has happened and it at the same time won't break any references to old state.

Affects

Layer1: neo-core needs to add support for VM/syscalls versioning and corrective transactions
Layer2+: will follow the corrective transaction changes, it still may cause some problems here, but IMO they'd be easier to solve than with any other solution because the state change is explicit.
"Victim": :+1:

Applicable scope: just about anything.

igormcoelho commented 4 years ago

Nice discussion @Tommo-L , as always. I know many options are already on the table, but like @roman-khimov , I still feel like some extra important information/options is missing for decision-making. Let me try to contribute General Idea: Divide deployed contracts into those who may accept BUGFIX, and eventually change their states, and those which won't accept NOBUGFIX, and if such "tricky" situation is ever presented, the contract is either frozen or migrated to a new one (no state ever changes).

There are two things still left behind from reasoning: "in-time" consensus perspective and contract programming nature itself. Issues with In-time Consensus As discussed some times with Yong Qiang, we need "in-time" requirement for consensus to work, we tend to forget that, but if we could go back in time and re-process blocks, those CN who were elected in the past, and then were taken out in the future (due to bad things) would still have access to their "past private keys", meaning they could still collude (in the past) to generate bad blocks (in the past). So, someone who is "left behind" may process bad data without any other way to discover that, causing a hard fork that would only be solved by a "world consensus" (suddently some blocks would sync and others won't, it two realities). So, in my opinion, this definitively rules out Options 2, 3 and 4, leaving just Options 1 and 5 on the table.

Issues with Contract Programming Now, I'll present my perspective on Contract Programming. We are used to C, C#, C++, Java, but forget about D Language and others that support the concept of Contract Programming. Usually, contracts can be basically done with assert() statements (something we remove on Release for efficiency), but blockchain world can benefit a lot from this perspective. Sometime ago I defended that users should be able to defend themselves from any undesired State Change, and proposed some draft NEP called Solid States. The idea is simple: as long as a transaction passes, it should always pass, and if it fails, it should always fails. So, if some user wants to protect its assets (with or without State Root hashes), it simply puts an "assert()" in its "code", any fail/exception launching opcode is enough for this, so if this "assert()" guarantees that its balance will be 2000, since this tx passed, it can rest assured that it will always be 2000 at that point. This is very good and, in my perspective it's currently necessary on Neo2, for DEX, and any cross-chain mechanism.

What happens is that, if users "abuse" from this, and suppose they "assert()" every operation on their Entry Script for transactions, paying this tiny extra GAS cost, they could ensure that "known bad operations" will be kept forever (example, it knows that DIV operation does not fail with zero, so it quickly submits a tx with an assert that x/0 equals 5, and it passes, locking our ability to fix that forever). So, for me, we can progress quicker in this discussion if we decide a few things regarding Neo3 Contract Programming logic: (a) Do we want to verify states for every deployed contract, without any bugfix capability or should we let user choose it? (b) Do we want that any possible "cheap user assert" locks our ability to bugfix anything in the future?

On my perspective, which I don't think was discussed before here, I'll guess what I think we should do:

Solution 6: We should allow users to mark deployed contracts as BUGFIX or NOBUGFIX.

If contract is NOBUGFIX, it will have its state stored on the NEXT BLOCK (on practice for efficiency, we take ALL contracts that are NOBUGFIX and create a merkle tree just to aggregate that contract root states). This effectively resolves the problem for all NEP-X tokens that want to crosschain or DEX. If they have some bug behavior on some block, some day, their state will never be changed, that's the rule for them.
The other contracts that are BUGFIX, will only have their states distributed via P2P, meaning that external entities should be aware that those operations may, for some unexpected situation in the future, have some state changed due to some bugfix.
Transactions that Fail may unfail, or the opposite, since Entry Script will not be protected from changes.
Deployed contracts should be renewed periodically (automatically renewed by default in the network), and this renewal may be blocked in the future if it's marked as NOBUGFIX and it explores a known bug (this prevents intentional injection of bugs in the ecosystem). In this case, Network Governance would be able to push some migration, in a previously authorized operation "force_migration_by_network" that must exist on all NOBUGFIX contracts. This "force_migration_by_network" could also be used by some future network migrations (such as Neo3 -> Neo4).

Now we come to the point, which contracts should be BUGFIX and which shouldn't? Now, we let users choose, and we let Neo Foundation also choose.

Ecosystem adjustments and Two NEO's My opinion is that:

Neo/Gas native should be BUGFIX
NEP-X tokens on general should be NOBUGFIX (or a hybrid, explained in the end)

This also means that we can create an alternative NEP-5-NEO-TOKEN-NOBUGFIX, which is collateralized by Native Neo BUGFIX. So, if some major stealing (such as on ETH DAO) ever happens, Native Neo will be fixed, but NOBUGFIX version may suffer future losses in its global collateral balances. This way, DEX could trade NEO-NOBUGFIX without any worry about state changes, and if some fix needs to be made on Native Neo, community on general would need to decide what to do about the collateral on NEO-NOBUGFIX. Seems quite reasonable to me. As long as collateral is low, risk is low, and if something "strange" happens, NOBUGFIX contract is temporarily locked for investigation.

Hybrid BUGFIX/NOBUGFIX Finally, I've also defended some time ago some Contract Inheritance strategy, meaning that we should be able to Inherit from Native Neo/Gas logic, thus creating a Hybrid BUGFIX/NOBUGFIX token, where its balance logic may suffer changes (and fixes) as Native Neo, while preserving its Governance Logic as rock solid. In this case, it should be marked as BUGFIX, otherwise it would become inconsistent when its "base" changed. For a proper "logic", a NOBUGFIX contract would only access/invoke other NOBUGFIX contracts, meaning that it could only receive Neo from a NOBUGFIX NEO... so, the process of "minting" would probably involving transforming you "volatile" Neo into a "non-volatile" Neo, and using this for minting. Neo blockchain itself could provide a "non-volatile" version and do this "automatically", as long as the holding logic of the "possibly lost future assets" is clear on NOBUGFIX NEO (the challenge of someone else coding this is precisely the limitation of NOBUGFIX do not invoke BUGFIX). To implement this "bridge", NOBUGFIX Neo could invoke some Oracle-like operation, like getCurrentStateForContractX, that would append on Tx header the p2p state at that time X (from speaker node, that would also be validated by other nodes when putting Tx on block), giving "allowance" for "unsafe Neo" to become "solid Neo". If state changes in the future, what happens in a discrepancy in what is effectively storage on NOBUGFIX Neo collateral balance. This NOBUGFIX NEO could also be FRACTIONAL, different from original Neo, and eventually could store some Neo for future insurance on collateral (if the backing state is not the state anymore, without exposing this to other platforms).

My opinion is that minting when raising funds in a NOBUGFIX contract could be done this way, accepting "volatile" BUGFIX Neo, even if the contract token itself is NOBUGFIX. The only risk is such state-changing unexpected event happening on NEO token during crowdfunding, so losses may affect only fundraising itself, at that particular time, but without any risk of "contaminating" this risk to third-parties (or DEX), since states wouldn't change for that specific token.

My vote So, I vote for this Solution 6, being operationalized by some "future fix operation" as proposed in Solution 5.

For the original question, we would need A (distribute global state with consensus p2p) and D (store on block header last state of NOBUGFIX contracts). Regarding versioning: I don't think it's fully possible to version everything, but one small useful versioning is the syscall list a least, so that introducing new syscalls won't break anything in the future.

Affects

Layer1: neo-core needs to add support for VM/syscalls versioning and corrective transactions (for state changes on BUGFIX contracts, ONLY in rare situations where Future Tx is used)
Layer2+: No problem if following NOBUGFIX contracts, and if these are disabled in the future, this will also be done properly/publicly in the blockchain, for the future only, without any state change. The issues are same as Solution 5 if following BUGFIX contracts. Some DEX may simply only list NOBUGFIX contracts, which is simpler to implement, so a NEO-TOKEN-NOBUGFIX would be very welcome).
"Victim": :+1: (if using NOBUGFIX, it's NOBUGFIX so don't complain, and if someone else is using a hybrid BUGFIX contract and you prefer that, just use that. "Victim" can choose.)

vncoelho commented 4 years ago

I agree with these points,Igor, I vote for Solution 6.

ZhangTao1596 commented 4 years ago

I have a crazy idea here.

First , should we keep the storages and balances from vm-1.x?

I think yes. If we all agree, we can continue.

Solution 7 `GetSnapshot` from network

After we implemented MPT, We can use MPT Proof to verify storage. So we can even sync storages from other nodes. We don't keep all different execution environments in one node instead we use the power of the whole neo network and keep different execution environments in the network. It will act like this in vm-2.x. edit (1)

In the neo network, vm-1.x-staorages are in node-1.x and vm-2.x-storages are in node-2.x. When node-2.x first sync block with ver-1.x, It doesn't execute them but just verify. When node-2.x start execute block with ver-2.x, It will sync storages it need from other nodes and use MPT to verify them.

And what will happen in vm-1.x state root upgrade Maybe we can stop execution and persist when sync higher version block. state root upgrade (1) Or we can even keep storage going by syncing. state root upgrade (2)

Advantages

Don't need put all verioning logics in one node.
Don't re-execute block and tx.
Keep history storages and balances.

Disadvantages

Keep different verions node in network
Add extra p2p messages to sync storage
Maybe execute slowly when persist first few blocks after upgrade.

ZhangTao1596 commented 4 years ago

@erikzhang Can you please have a look at these solutions, which one do you prefer?

erikzhang commented 4 years ago

Option B is good to me.

roman-khimov commented 4 years ago

Solution 7

This one seems to be cross-chain compatible, but still it has a problem of state changing suddenly. It's like you have a $1000 on your account at block N and now you have $100 at block N+1. You may wonder why did it happen, but the only answer you get is "we've had a software upgrade". I think that for every state change there has to be a clear traceable answer to that question of why.

Option B is good to me.

How are we going to solve cross-chain issues and the problem of synchronization between two chains in the same network (like https://github.com/neo-project/neo/pull/1702#issuecomment-643949681)?

And what's the advantage of it? If we're to return to the two main points reminded to us by @Tommo-L:

If we separate state persistence from block persistence, the consensus node can quickly process transactions without waiting for state writes and smart contract execution. original posted at #302 (comment)

CNs have to have an up-to-date state to participate in consensus, they can't do anything useful until they have this state (it can be seen even in the current neox-2.x implementation), so I don't see how detached state makes CNs more performant. In fact it may even slow things down because of networking issues (or even completely break consensus for the same reason).

As shown in the figure below, if a bug in the VM caused an incorrect calculation of 1+1=3, we want to fix it.

And this one now has like 7 different solutions right in this thread.

I want to make sure we're doing the best we can for Neo 3.

igormcoelho commented 4 years ago

@KickSeason if I understood correctly, the issue I see on Solution 7 is precisely that we may not want to keep some states from vm-1.x, for example, if these resulted in asset losses that were fixable in vm-2.x.

We already have 7 solutions, maybe we can agree on the basics first:

Why do we want to allow bugfix? For me, the reason is: if native assets suffer a "sudden crazy change" because some poor implementation passed on VM/Interop layer that went to production, we can fix these.
Why do we want to prevent bugfix? For cross-chain and DEX we want to ensure that past states are immutable, otherwise we can generate severe external inconsistencies.

@Tommo-L @KickSeason @roman-khimov @shargon and specially @erikzhang , is there any chance you agree with some idea of having bugfixing limited to some contracts (like native Neo) and some states stored on header for other contracts (like some token that wants immutability)? (presented Solution 6).

We can find some hybrid final solution, as long as we agree on fundamentals that we want network to provide.

vncoelho commented 4 years ago

@KickSeason,

After we implemented MPT, We can use MPT Proof to verify storage. So we can even sync storages from other nodes. We don't keep all different execution environments in one node instead we use the power of the whole neo network and keep different execution environments in the network.

That is a good point! Nice insight. It is possible to sync storages and, then, just check and sync with the current P2P broadcast from Validators. In this sense, you can speed-up sync considerably.

roman-khimov commented 4 years ago

Why do we want to allow bugfix?

Because there is some intention that we have when writing software and the source code is just an attempt at formalizing this intention, this can be a nice attempt, but still it (quite often) happens that the formalization is not exactly what we've intended. A bit shorter version: there are bugs.

Why do we want to prevent bugfix?

Because "we" are the ones exploiting this bug? Sorry, but probably that's the only possibility I can think of. In general, I think that once the bug is known most of the people would want to fix it, because their intentions didn't include this bug. And to be fair that's actually why I think marking contracts as bugfix-impossible (part of solution number 6) won't be used a lot and thus one generic bugfixing path is enough.

But at the same time IMO it's not correct making a direct relationship between allowing/disallowing bug fixes and allowing/disallowing state root changes. It is possible to be able to fix bugs and don't change old state roots at the same time. And make any state changes resulting from bug fixes traceable.

We can find some hybrid final solution, as long as we agree on fundamentals that we want network to provide.

We should and therefore I also should note that I'm basing on the following expected characteristics set:

ability to fix bugs (otherwise the chain will only live until the first serious bug found)
1:1 immutable relationship between block and state of the chain (and 100% reproducibility of this state when processing the chain from the genesis block, otherwise it's not possible to reliably refer to the state of the chain)
the state is only changed by blocks (yes, we're talking blockchain here)
any change in the state must be fully auditable (no sudden changes in the state out of nowhere, the chain should explain the change by itself)

Basically, it's all about making the behavior of the system predictable.

ZhangTao1596 commented 4 years ago

This one seems to be cross-chain compatible, but still it has a problem of state changing suddenly. It's like you have a $1000 on your account at block N and now you have $100 at block N+1. You may wonder why did it happen, but the only answer you get is "we've had a software upgrade".

@roman-khimov When node-2.x persist the first block after upgrade, it use new execution logic but the storages are from node-1.x. Why is there sudden changing? If there is change, it must happen in some tx in N + 1 block.

roman-khimov commented 4 years ago

When node-2.x persist the first block after upgrade, it use new execution logic but the storages are from node-1.x. Why is there sudden changing?

Ah, maybe I've misunderstood this one a little. So basically it's the same as solution number 1, it's just that the new node doesn't contain the logic for VM 1.x and the only way to get the state for old blocks is via P2P from old nodes?

ZhangTao1596 commented 4 years ago

When node-2.x persist the first block after upgrade, it use new execution logic but the storages are from node-1.x. Why is there sudden changing?

Ah, maybe I've misunderstood this one a little. So basically it's the same as solution number 1, it's just that the new node doesn't contain the logic for VM 1.x and the only way to get the state for old blocks is via P2P from old nodes?

Yea!

roman-khimov commented 4 years ago

OK, thanks for clarifying that. For some reason my first impression was that new (VM 2.x) nodes bring the new state as if they were running from the genesis block.

But then solution number 7 (S7) has the same basic characteristics (limited scope) as solution number 1 (S1) and is mostly concerned with questions of compatibility and maintenance, where S1 is about keeping the VM 1.x code in the node, S7 makes the node cleaner by removing it and relying on the network to get proper states for VM 1.x epoch.

It's a nice hack, but at the same time I think that shifting this maintenance burden from the node code to node instances is a bit problematic as nothing guarantees long-term node-1.x existence in the network. And we can have like 10 VM versions, so there would have to exist nodes for each VM version and someone would have to maintain them. And what if some non-VM bug would need to be fixed? We'll have to update all 10 versions of the node. And it would be hard to ensure we're not breaking anything. I think in practice it would outweigh the effort required to maintain compatibility in one code base and single code base is more reliable, it's trivial to test the code against known behavior (state root should match).

This mechanism of P2P state sharing may be useful for some other purposes, though.

ZhangTao1596 commented 4 years ago

Erik's new idea: https://github.com/neo-project/neo/pull/1793#issuecomment-688672289

erikzhang commented 4 years ago

Can we decouple StateRoot from the consensus algorithm? We can add a new role to the chain, the status validator. They are designated by the committee, communicate through additional channels, sign StateRoot and broadcast.

In the future, we may provide more services. It is impossible for us to attach all services to the consensus algorithm, otherwise the consensus algorithm will become more and more cumbersome. We must reduce the coupling as much as possible.

vncoelho commented 4 years ago

In a Multi-Agent Systems perspective, which I believe that we current rely on, that would be the natural direction to go. In that same way we had used StateDumper before and, on the same line of reasoning, the oracles could be an open service with a basic template guided by native contracts (even considering that other SC can deployed with additional features, by having them on a native contract we can seamlessly integrate them).

roman-khimov commented 4 years ago

What if status validators would sign some state different from the one that CNs have? Like they think that account A has 1 GAS, but CNs think it has 2 GAS. Then there is a transaction coming from A which pays 1.5 GAS fees. CNs would happily accept it, include it in the block and then what?

StateRoot is not a service. That's the key difference between it and oracles.

State is the essence of this system. State is what people care about, that's their tokens/assets/cryptocuties. If there is any disagreement between nodes on what the state is at the moment --- the network just can't reliably work. State is something that directly affects transaction validation. And transaction is just an expression of state change.

So it's absolutely natural for CNs to synchronize their view of the state during consensus process, if there is any difference, they can't really proceed. And as CNs produce blocks, their agreement on the particular current state should be expressed in these blocks, because it's a property of the chain we have.

erikzhang commented 4 years ago

@roman-khimov Most of your views are correct. But state and StateRoot are not the same concept. The state of all nodes should be consistent. If they are inconsistent, it means that some nodes are implemented incorrectly and must be corrected. StateRoot provides a proof of state consistency, which is obviously a service. Without StateRoot, the blockchain can also work well, just like neo2.

erikzhang commented 4 years ago

In addition, I think that StateRoot does not need to be broadcast through the P2P network. Perhaps it is a better choice to get it through RPC. Because only a small number of nodes have the need to get StateRoot.

ZhangTao1596 commented 4 years ago

In addition, I think that StateRoot does not need to be broadcast through the P2P network. Perhaps it is a better choice to get it through RPC. Because only a small number of nodes have the need to get StateRoot.

Every node can calculate their own state root. But the right state root is confirmed by validators. We need the p2p network to spread the right state root. If only validators can provide right state root via rpc, all users will use these rpc service to get state root and get storage proof.

If so there is nothing necessary to put into neo-core, all move to Plugin?

erikzhang commented 4 years ago

If only validators can provide right state root via rpc

My original idea was that the state validators sign the StateRoot, and then the users get them from the seed nodes through RPC.

But the difficulty is how the state validators send the signatures to the seed nodes.

Tommo-L commented 4 years ago

The state of all nodes should be consistent. If they are inconsistent, it means that some nodes are implemented incorrectly and must be corrected.

Agree, which also means that every node requires StateRoot to ensure data consistency. If so, StateRoot should be broadcasted by p2p message like transaction, block etc. For external nodes, what they need are the verification path of state root, which can be provided by rpc service.

Tommo-L commented 4 years ago

We can add a new role to the chain, the status validator. They are designated by the committee, communicate through additional channels, sign StateRoot and broadcast.

em...., the whole chain is actually decided by the status validator, which is more important than consensus node(maybe need to rename it package node). Some related https://github.com/neo-project/neo/issues/1285

Tommo-L commented 4 years ago

According to the situation of Neo2.x, it's recommended to put it on other nodes, to ensure that the consensus node is more secure.

roman-khimov commented 4 years ago

The state of all nodes should be consistent

And state root data can guarantee that. Lacking it we can only hope that it's the case.

Without StateRoot, the blockchain can also work well, just like neo2.

Neo2 has UTXO model for base tokens and that at the very minimum means that in Neo2 storage state does not affect the chain consistency. In Neo3 it directly does, see the example above with the state being different between CNs and proposed state validators (SVs?). Only CNs can serve as a reference point for the network state because only their state technically matters (because they're deciding which transactions get added into blocks).

to ensure that the consensus node is more secure.

Please explain what kind of security we are talking about here.

the situation of Neo2.x

IMO the situation there just confirms that the stateroot data is a natural part of the block header. There is just one hash to be added and that's it. All the numerous Neo 2.x problems with state root are direct consequences of the fact that this data is separated from the main chain. And it's exactly what I'm proposing to eliminate to solve these problems in Neo 3. There would be no getroots/roots messages, there would be no need to relay anything other than blocks and headers, there would be no question of the network synchrony. But there would be predictable behavior for every block and every node. Don't we want that for Neo 3?

vncoelho commented 4 years ago

@roman-khimov if the state root included in the block H is from H -1, then, I believe it would be ok in terms of extra information and no burden on blocks production.

shargon commented 4 years ago

According to the situation of Neo2.x, it's recommended to put it on other nodes, to ensure that the consensus node is more secure.

We can't have a safe consensus with different states in his nodes, for me it's important to ensure that all cn has the same mpt.

erikzhang commented 4 years ago

MPT cannot ensure state consistency. The only way to ensure state consistency is to carefully implement the NEO protocol. I think MPT is just a tool for checking whether the state is consistent. Therefore it should not be considered indispensable.

vncoelho commented 4 years ago

The only way to ensure state consistency is to carefully implement the NEO protocol.

That is true. There is also an additional point that @igormcoelho always emphasize, @erikzhang, and which is a general requirement of PBFT inspired mechanisms. We should have consensus running on different implementations. That is why we are advancing in a C++ implementation.

roman-khimov commented 4 years ago

carefully implement the NEO protocol

Well, at the moment that looks somewhat like this: https://github.com/nspcc-dev/neo-go/issues?q=is%3Aissue+state+mismatch+is%3Aclosed+, feasible, but not trivial. And probably that's one of the reasons I'm very biased here. We don't have any specification for the protocol, so it's very, very easy to miss something. That creates mismatches and bugs that are not easy to find in absence of state data. And again, these bugs are way more dangerous for Neo 3 than for Neo 2 that had UTXO model.

Even the state root data we have in 2.x already proved to be useful. NeoGo 0.76.0 release had a bug in the new EC key recovery interops, but NeoGo testnet CN worked fine with this release until block 4516236 that brought with it a transaction invoking one of these syscalls. Because there was state root data exchange between CNs during consensus, the mismatch between our node and other nodes was immediately noticed and acted upon (0.76.2 fixed it). What would happen if we didn't have state root data? State mismatch could go undetected for days, weeks and months easily. And it's just dangerous. And this can happen even if all nodes are to use the same perfectly implemented software. DB can get corrupted, the node might get hacked, all sorts of bad things can lead to different state.

MPT is just a tool for checking whether the state is consistent

Just as a side note, there are easier, lighter ways to ensure state consistency (like the one outlined in #1284), but IIUC we need MPT for proofs anyway and as it also solves consistency problem, it's OK to use it for both.

We should have consensus running on different implementations.

Absolutely agree with that, C#, C++, Go, Python --- every full node should participate. We need some compliance checks though, there is something close to that for neo-vm with its JSON-based test set, but even that is not sufficient. Just to mention, NeoGo was fully compliant with neo-vm test set (for Neo 2) with release 0.62.0 (07 November 2019), but the latest known VM bug was fixed there in 0.76.2 (19 July 2020).

Tommo-L commented 4 years ago

We don't have any specification for the protocol, so it's very, very easy to miss something. That creates mismatches and bugs that are not easy to find in absence of state data. And again, these bugs are way more dangerous for Neo 3 than for Neo 2 that had UTXO model.

That's why we want to put it outside the consensus node. In Neo2.x, we have encountered a number of problems, such as inconsistent implementations of C# and Go, and network congestion with StateRoot synchronous messages.

In Neo3, when our state root protocol and specification became more stable and reliable, and indeed necessary to put it back, we keep that possibility.

roman-khimov commented 4 years ago

We don't have any specification for the protocol, so it's very, very easy to miss something. That creates mismatches and bugs that are not easy to find in absence of state data.

That's why we want to put it outside the consensus node.

IMO, that's just sweeping the problem under the rug. If there is any state mismatch, I'd like the node to fail immediately instead of pretending to be OK. If there is state mismatch between CNs I'd like to reject the proposal from misbehaving node. Yes, it can make consensus fail, but it's safer than accepting a broken block.

when our state root protocol and specification became more stable and reliable

We have a (quite proven) protocol for blocks. One more field there (with minor adjustment to consensus messages) and we're done. There would be no stateroot messages to worry about.

vncoelho commented 4 years ago

As previously said, for me there is no drawback in attaching the stateroot of block H-1 in the current Preparation of block H, that would be like a checkpoint.

If backups agree, consensus will follow, otherwise, liveness may be impaired until resolved.

Stateroot of current block will still be distributed through P2P.

shargon commented 4 years ago

What's the point of making the stateroot optional, but force the node to use an MPT?

roman-khimov commented 4 years ago

BTW, there is a number of pre-2.12.0 and even pre-2.11.0 nodes on Neo 2 mainnet at the moment, all of them have broken state, but they don't know about it. With in-block state root hash they would know it immediately, although it's Neo 2 and it can't really have it. But now image Neo 3 would need roll out anything comparable to the recent 2.12 changes, without in-block state data that would be the same as with Neo 2 now (lots of outdated nodes with obviously wrong state), with in-block state data old nodes would just stop working because of state mismatch forcing their operators to upgrade.

roman-khimov commented 4 years ago

Speaking of broken state, nspcc-dev/neo-go#1456 and #1989 is a nice illustration for Neo 3, the problem occurred at the height of 127240, but it was only noticed now, around block 278513. Had we had in-block state root hash, this mismatch could've been detected (and fixed) a month ago.

roman-khimov commented 4 years ago

It looks like we have two main options now, so I'd like to summarize all known data for them for more productive subsequent discussion. Option A is to use designated state validator nodes that would produce state root chain similar to what we have on Neo 2. Option B is to add state root hash for block H - 1 in the block H header. Everywhere down below CN is consensus node and SV is state validator.

Let's first briefly outline the problem space, we have a number of problems related to this topic:

network synchrony:
- consensus synchrony Ability to detect CN with wrong state.
- ordinary node synchrony Ability for a node to detect if it's synchronized with the network.
cross-chain state proofs Ability to reference this state in other chains and get proofs for KV pair existence in particular state of the chain.
compliance requirements for audit trail Ability to trace any state change to specific block/transaction that made this change.
bug fixing Ability to fix state caused by node implementation bug.

The most important thing here is to agree that we have these problems. Now if we're to try to apply different state solutions to them we'd get the following table:

Problem	State validators	State root in the header
Consensus synchrony	Out of scope	Solves the problem
Node synchrony	Weak solution relying on additional P2P exchange	Solves the problem
Cross-chain proofs	Solves the problem	Solves the problem
Audit/compliance	Partial, separate chain is not acceptable in some enterprise scenarios	Solves the problem
Bug fixing	Partial, state versioning alone is not enough	Requires additional mechanisms

Bug fixing is a bit special to me in that both solutions while being related to it doesn't actually solve it (we can discuss it separately). Now let's take a look into different aspects of these solutions not strictly related to main problems outlined above.

Complexity of implementation

In general, simpler solutions are preferable as they're usually more robust.

State validators

Implementing SVs requires:

designated role
SV consensus protocol
P2P state exchange messages

Part of this work is already done (P2P), part of it still requires some attention (roles and SV module with consensus protocol).

State root in the header

Implementation requires:

adding one field to PrepareRequest messages
adding one field to blocks

Given all the components we have already, an initial implementation can be done in half a day (that's what we actually did this Monday in nspcc-dev/neo-go#1500).

Performance and size

We don't want any of this technology to seriously affect our (good) performance characteristics and we don't want to waste a lot of storage space for it.

State validators

Zero influence on CN performance.

Each node synchronizing state from SVs will require an additional 69 bytes per block for state root data (as it is defined for Neo 2) and 253 bytes for "5 out of 7" witness.

State root in the header

Close to zero influence on CN performance, details below.

Each node will require additional 32 bytes per block to store additional header field.

TPS Metrics

There are some concerns over in-header state root effect on performance, so we have measured our prototype implementation of that from nspcc-dev/neo-go#1500. We have MPT implemented in neo-go master branch and it's being calculated for every block even without any state root data exchange, so to really satisfy our curiosity we have also made one more branch that disables MPT calculation completely. So in the end we have three node versions compared using neo-bench in single-node and four-nodes scenarios under the load of 10 worker threads:

default neo-go master (with MPT, but without state roots)
node without any MPT calculations at all (https://github.com/nspcc-dev/neo-go/tree/noroot)
node with in-header state root data (nspcc-dev/neo-go#1500)

The results measured on another random laptop (using LevelDB) are:

Node	Single node TPS	Four nodes TPS
Default	10012	1042
No-MPT	10603	995
In-header state	10622	1089

Basically, there is no significant statistical difference between all of these results, several subsequent runs of neo-bench can easily give more difference than there is in this table (especially for four nodes).

Failure modes

If we're to try doing some FMEA-style analysis of these approaches we can note some difference also.

State validators

Architectural separation of SV nodes introduces the following additional failure modes:

state difference between CNs and SVs Severity 10, occurence 2, detection 8 It's not likely to happen, but it can and if it is to happen the consequences would be bad. It's also hard to mitigate it in any way, even though CNs can compare their state with SV-signed state roots, they can't depend on them providing this data in timely fashion.
absense of signed state roots Severity 4, occurence 4, detection 3 SVs can fail to produce new state root data for reasons like poor network connectivity or some nodes going offline, that would impact anything depending on that data including network's cross-chain functionality.
P2P state root exchange not happening Severity 4, occurence 6, detection 5 We've seen that on Neo 2 already and it also affects anything associated with state root data.

State root in the header

Implementing state root in the header ties it to the consensus process, CNs state is the state of the network and there are no state roots only if there are no blocks which is a complete network failure scenario not specific to state roots. Same thing for distribution, it's distributed with blocks/headers. There are no new failure modes introduced.

Conclusions

I won't give any for now, this is just a PrepareRequest with the state root of discussion. Cast you PrepareResponses down below and get ready to Commit soon.

igormcoelho commented 4 years ago

Truth is: users are able to defend themselves "easily" (obviously they depend on wallet developers for that), as demonstrated on NEP-proposal https://github.com/neo-project/proposals/issues/97

On Neo2, UTXO prevents double spending, but same is valid for Neo3, since this kind of filter/validation occur at same phase: transaction verification. As long as wallets attach assertions for "things that matter to them", including balance proof on NEP-5, it is impossible to reproduce that transaction with broken funds, with or without MPT. This is a very important clarification, since in my perspective these "valious assets" represent the core of blockchain economy.

The drawbacks, as discussed, as that this NEP proposal harms performance and parallel operations on same accounts, which is why we need some better solution, like MPT or any other technology of the kind. In this sense, I agree with the vision by @erikzhang , storage proof verification is a service which is not necessary to be embedded on consensus, so an RPC call for verification is enough for chain (or cross-chain) operation.

Implementation mistakes will always occur, even with a perfect specification (which is obviously impossible to do). Some external library can cause issue, or some regression anywhere else on the computing world, so I tend to disagree into carving down to eternity our own silly mistakes. Since PBFT and before, it is well-known that consensus system will only be safe if it is made by several different people on several different languages. If we have that, some basic consensus state verification during dBFT phases (as pointed out by @vncoelho) is more than enough.

But the difficulty is how the state validators send the signatures to the seed nodes.

In this point @erikzhang , we have no other option: we must send data via P2P. Otherwise, we risk dropping some VERY interesting security aspects of Neo, including: no direct connections between CN; inter-dependence of the P2P network for message distribution (neutral actors).

So I believe that CN should consume their own state verification service (to prevent external frequent calls), relative to previous block (H-1), according to some mutable technology in CN messages (starting with MPT, but future who knows...). These same messages are broadcast via P2P, so every "smart" node can relay them with blocks and maintain some verifiable state in real time (without needing to put in headers).

A question that remains unanswered is: how to deal with broken verifications? Is it acceptable to have a transaction in blockchain that fails verification? I don't think so. If this is true, and clients are powerful enough to push such assertions (as described on the very simple NEP above), then bugfixing capabilities will ALWAYS become very limited, according to the usage and natural aging of the blockchain system (what happened to Neo2, with many evolving tx formats, etc). If this is also true, then we can also assume some periodic recycling, thus putting some Neo4 already in the radar (for some not so distant future).

P.S.: I forgot to mention another interesting point raised by Wang YongQiang, that we cannot have any blockchain security, even with complete MPT headers carved into blocks, if we cannot guarantee some real-time sync assumptions. And we have real-time, we can just use RPC service, as mentioned by @erikzhang.

neo-project / neo