State root checkpointing

lrettig commented 4 years ago

This question has come up a few times but I don't think we've reached consensus on it yet so I'm opening an issue to discuss it.

In Ethereum and many similar blockchains, each block includes a single "state root," which is the root of a Merkle tree (or trie) that contains all account state after that block is processed. This means that if a node runs all of the transactions in a block through the STF and arrives at a different state, it will immediately reject that block. It's worth pointing out that this isn't purely academic or theoretical, as it's happened in Ethereum several times - often due to a bug in one implementation of the VM.

We don't have something like this in the current Spacemesh design and it's an issue. For sake of argument, imagine that there are multiple Spacemesh implementations and that a bug in one of them (inside or outside the VM) causes some subset of nodes to incorrectly calculate the balance after a transaction. Every node will still think it's in sync (because it has the same layer, block, and tx data), the network will not fork, and there will be no indication of any issue here unless and until the accountholder in question tries to "double spend" some portion of those funds that are in question (i.e., that some nodes think they have, others think they don't have). This could take a long time, or it may never happen.

I propose introducing a frequent "state root checkpoint" so that issues of this sort are immediately detected. (How to deal with them once they're detected is interesting but it's a separate question that has more to do with social governance than with research so I won't go into it here.)

Here are three proposals for how we might achieve this, in order from least complex and least desirable to most complex and most desirable:

Extend the ATX to include a state root checkpoint - i.e., the ATX indicates a specific layer (probably the end of the previous epoch) and the node's view of the state root after including all transactions in that layer. This would give us a once-per-epoch checkpoint. Nodes that detect ATXs with state roots that they consider invalid could fork away from those miners, or quit entirely - both of which are better than the status quo behavior IMHO.
Extend each block to include a state root checkpoint, which indicates the node's view of the state root as of the previous layer. A node could reject blocks (as syntactically invalid) if they disagree with the node's observed state root.
Similar to (2.) but include two state root checkpoints in each block. The first, as in (2.), is the state root as of the previous layer. The second is the state root after applying the transactions in this block as if it were the only block in the current layer (i.e., directly on top of the state as of the previous layer). This would most closely mimic the behavior in Ethereum: a node could independently verify that applying the transactions in each block produces the expected result, and would provide block-level granularity when something goes wrong, i.e, we could immediately detect the offending block.

Keen to hear thoughts. CC @noamnelke, @tal-m, @avive, @YaronWittenstein

noamnelke commented 4 years ago

I think 1 is a pretty good idea!

2 and 3, however, are not feasible. Mainly because there’s not necessarily consensus about the previous layer. Also, for 3 - performing the state transition function is expensive, so executing it and most likely throwing away the result is very wasteful.

For 1 the same problem could apply but we can take a large margin and point to a far enough layer where consensus is expected to have been reached.

I also like this solution because it’s cleaner to eliminate a miner before the epoch started and the number of blocks won’t be affected.

lrettig commented 4 years ago

2 and 3, however, are not feasible. Mainly because there’s not necessarily consensus about the previous layer.

After the Hare finishes, there should be consensus, shouldn't there? If the Hare fails, or self-healing kicks in, I agree, that's another situation. I'd still rather see (2) or (3) with a fallback to (1) if, e.g., the Hare fails.

noamnelke commented 4 years ago

The hare is not guaranteed to finish (at all) and definitely not in two rounds (the time until the next layer's blocks should be published). Also, the hare is probably not enough. What happens if there's a re-org? Do all the blocks suddenly become invalid?

This makes it impossible to require blocks to include the previous layer's hash. We could still have blocks point to older layers, but if there's already a time period where we're not sure - the checkpoint might as well be once per epoch, imo.

We'll have to leave a really long safety distance when including layer hashes, since even the Tortoise isn't final. There could be a self-healing event for days back, or more.

Alternatively / additionally, we can think of a mechanism for ignoring these layer hashes in case of a self-healing event that would likely result in a long re-org, since a self-healing event is detectable, iiuc.

avive commented 4 years ago

@tal-m @iddo333 can you guys please chime in on this?

avive commented 4 years ago

lrettig commented 4 years ago

Alternatively / additionally, we can think of a mechanism for ignoring these layer hashes in case of a self-healing event that would likely result in a long re-org, since a self-healing event is detectable, iiuc.

This is kind of what I meant by

I'd still rather see (2) or (3) with a fallback to (1)

Maybe we could achieve the best of both worlds.

tal-m commented 4 years ago

There are some substantial differences between Ethereum and Spacemesh that are very relevant to checkpointing.

Validity of blocks cannot depend on state

First, by definition, syntactic validity of blocks cannot depend on your current state. This is because your state is a function of information that is not contained in the blocks (e.g., votes of blocks in later layers), so it's not a syntactic property of the block. (A core requirement for our consensus protocol is that honest parties who receive a block agree on its syntactic validity regardless of whether or not they received the exact same set of blocks.)

The contextual validity of (old) blocks is determined by the "majority vote" (more or less). This is what guarantees irreversibility of history. If we make old blocks invalid when they conflict with what a node believes is the "correct" state, this would increase the damage of VM bugs --- any VM inconsistency would now cause a fork.

On the other hand:

Spacemesh doesn't need validity to depend on state in order to prevent forks

The Spacemesh protocol doesn't invalidate blocks based the contents of included transactions. That is, a block can include double-spending transactions and still be valid. (If this happens, the VM will execute both in order of their appearance in the ledger, and of course the second one won't manage to spend more money than there is in the account).

The consensus layer can be completely separate from the VM: if there's a bug in the VM that is later fixed, the contents of the ledger --- as a sequence of transactions --- will not change (and will remain in consensus), but the nodes with the previously bad VM will rewrite their history --- as a sequence of states --- once they switch to the good VM. This happens instantly (as fast as the fixed VM can execute, at any rate) and does not require self healing.

Checkpointing to detect VM bugs

Of course, a consensus failure at the VM level is still a failure, and it's important to detect this, so state checkpointing can still be useful. For this use, adding a hash to each block or each ATX pointing to an "old enough" layer would work.

Dealing with detected failures

We've already said that when we detect a failure, we can't invalidate old blocks. What can we do instead?

Alert the user: We already have a "confidence" for transactions; if we detect that our state differs from the majority of previous blocks, we can reduce confidence for everything after the state split.
Punish the guilty? While we can't invalidate old blocks, we can vote against new blocks from a smesher that appears to be misbehaving. We have to be very careful with this. Any such punishment must satisfy two requirements (and it isn't trivial to do this):
- Honest users must always agree on the parties to be punished (otherwise we can end up in self-healing even if all assumptions are satisfied)
- Honest users must never be punished

Things get even more complicated if rational users prefer to rely on previously computed states rather than compute their own (in this case, we could have a "fork" between the real state and the state promoted by the majority of users).

Checkpointing for Lite Clients

Lite clients have stronger requirements from state checkpoints: they want to verify that the state at the current time is X without running a full node (in particular, without reading the entire mesh) and without having to blindly trust a server.

It's not clear that simply adding a hash with the current state root will help in this case; this requires further research. (see #46)

avive commented 4 years ago

@tal-m - thank you for putting this useful comments here. I'm glad to see that we start to address this issue and have an outline of a solution. I think that summary of the solution direction that I like can be summarized as:

Of course, a consensus failure at the VM level is still a failure, and it's important to detect this, so state checkpointing can still be useful. For this use, adding a hash to each block or each ATX pointing to an "old enough" layer would work.

Alert the user: We already have a "confidence" for transactions; if we detect that our state differs from the majority of previous blocks, we can reduce confidence for everything after the state split.

We need to formalize it so it can be developed. I think it is an important feature platform-wise. @noamnelke

avive commented 4 years ago

We also need to think about how people who run full nodes can easily choose to try to compute the state again from a layer that they get a warning about their state possibly being invalid. Otherwise they have no way to get out of the bad state that their node might be in due to a runtime bug.

avive commented 4 years ago

Let me try to sum up this issue from a product perspective:

A Spacemesh full node should be able to automatically recover from a condition where it has computed a different global state for a layer from other nodes who computed the correct global state for that layer. This happens when there's an execution bug in one or more transaction (or rewards) processing performed by the full node.

Full node human operations should not need to do anything in order to recover from this condition. The full node should handle this automatically.

antonlerner commented 4 years ago

Some conclusions

So if I understand correctly, checkpoints is something we want to have and include in ATX / blocks for far enough layers. If we agree on this the next question should be how to implement such checkpoints and at what frequency

current solutions

currently, we have a merkle trie based DB that performs a copy on write every time the state changes. I then updates the global state root in a complexity of O(log(n)) this allows us to store a state root per layer and revert to it in O(1) after which we can play transactions to advance the state root. The storage we take for such data structure is 2N where N is the amount of changes we perform on the global state because it is an append only data structure.

solution requirements

Now that we know what exists, we can analyse any proposed solutions by the same parameters and compare

Do we require state snapshot, if so, how many snapshots do we want to store
Should we optimise current layout of state in storage? eg. replace merkle patricia trie with another (more efficient) data structure, and what should be the runtime, memory and storace specification of such data structure in terms of
- read (runtime)
- write (storage, runtime)
- revert (runtime)
- creating checkpoint (runtime, storage)

The main question I have is, is what we have not good enough? If so, lets get the spect for what we want to have and find the appropriate way to implement LMK what you think guys

@tal-m @iddo333 @lrettig

noamnelke commented 4 years ago

@avive

A Spacemesh full node should be able to automatically recover from a condition where it has computed a different global state for a layer from other nodes who computed the correct global state for that layer. This happens when there's an execution bug in one or more transaction (or rewards) processing performed by the full node.

You're suggesting that if a node reaches a different state than other nodes it should assume that it made a mistake. This is the complete opposite of how trustless systems work. If an Ethereum node sees a block with a different state root than what it calculated itself - it will consider the block invalid, not its own calculation.

There's no way that I can think of to "recover" if a node concludes that it made a mistake. But you can also never know for sure that you're running buggy software. There's no way to know that without trusting your peers, and if we do - we're not running a trustless consensus protocol and then we can get rid of all the costly proofs and just ask our 8 peers what the state is - much simpler.

avive commented 4 years ago

I think that the current consensus from research on this is issue that there will be no checkpoints and the solutions is a full validators protocol role implemented by full nodes to reach distributed consensus on the canonical state and introduced as an update post genesis.

lrettig commented 4 years ago

There's no way that I can think of to "recover" if a node concludes that it made a mistake

I'm not sure that's completely true. Isn't this sort of what self-healing does? If a node sees that it voted a certain way (with respect to, say, the validity of a past block), but a majority of other nodes voted another way, isn't it already "recovering" from this faulty judgement on its part?

I've also seen self-healing mechanisms that include the implementation of the node software: there are three full node implementations, and from the perspective of the protocol/consensus mechanism, if 2/3 vote together, the third is considered to be faulty.

In practice what has happened in the past in Ethereum is that one implementation may be affected by a bug that causes it to completely stop producing blocks. You can't automatically recover in this case, but recovery can happen at the social layer.

lrettig commented 4 years ago

Upon further reflection, I think there is a fundamental difference between a node "recovering" from a "faulty" judgement with respect to something subjective, e.g., a block's contextual validity (arrival time), versus from something objective such as the rules of the protocol. @noamnelke is of course right that there is no way to "recover" if you think you're following the protocol (due to a bug, or not, it makes no difference) and that others are not. This will by definition always result in a consensus fork.

lrettig commented 4 years ago

From https://community.spacemesh.io/t/state-requirements-representations-and-rewinding/77/38:

Re: @tal-m: To clarify: as Iddo says, req 7 is for lite client support. You don’t need cryptographic proofs that your own full node is running correctly — you know what code you’re running (and if you don’t, why would you trust your client code more?).

I don't agree that this is only for lite clients. Full nodes have bugs, too, and if there's a way to immediately detect divergence in state, it's much easier to address—rather than days, weeks, or epochs later. You absolutely do need proof that your own full node is running correctly, if there are other "more correct" implementations in the wild and there's a chance that the state you think you've been looking at turns out to have been wrong (for some fuzzy, social definition of "wrong").

Concrete example: while we obviously do not have multiple, independent full node implementations, and won't for a while, as @iddo333 likes to remind us, ambitious network participants may decide to tweak their (open source) node software to be more efficient. Yes, we'll do everything we can to use good software engineering practices and test every edge case, but there will inevitably be some that we miss and bugs will arise. If you've tweaked the canonical implementation of the software yourself, you have no reason to trust the software you're running more than the software the rest of the network is running.

In any case, I think we're all in agreement that there's not much we can do about this today. For one thing, it doesn't matter that much unless/until we have multiple supported implementations. For another, I see Iddo's point that the protocol cannot require state checkpointing, and that we need to maintain separation between consensus and VM. In practice good software engineering, a robust test suite, and having canonical block explorers that users can compare state against get us most of the way there.

spacemeshos / research