Enforce SBliFF finality

Goal:

detect forks, prune branches and stop adding blocks to known forks

simplifying assumptions:

no rollback of specific blocks considered: if we're on a fork, we need to resync from a peer or rescue from a valid snapshots in our own db

requires

https://github.com/integritee-network/worker/pull/1586

status quo

feed incoming block candidates into EnclaveSidechainBlockImportQueue = ImportQueue<SignedSidechainBlock> FCFS
in the QueueWorker, pick the most likely matching candidate and attempt to import, ignorant of possibly-known finality on L1
if yield_next_slot -> propose a new block

RPC

can ask peer for a specific canonical series of blocks between two hashes sidechain_fetchBlocksFromPeer (cannot request blocks on branches we don't know they exist)

Proposed Implementation

replace ImportQueue<SignedSidechainBlock> with ForkTree<Hash, SidechainBlockNumber, SignedSidechainBlock>
upon incoming block candidate from peers, call ForkTree::import() (async)
upon enclave_runtime::execute_trusted_calls_internal():
1. call ForkTree::finalize_with_ancestors(latest_sidechain_block_confirmation)
2. refactor QueueWorker + PeerBlockSync logic to do:
  1. If latest_sidechain_block_confirmation not in our ForkTree, populate ForkTree from peer (while keeping all nodes we know already)
  2. ForkTree::prune() (possibly pruning with delay of a few block finalizations to service peers if needed)
  3. pick the child of the longest chain in ForkTree which is descendant of latest_sidechain_block_confirmation using ForkTree::rebalance()
  4. if no valid child, purge state and re-provision snapshot from peer

Initial behavior

primary validateer needs to assume block 1 to be finalized implicitly
secondary validateer
1. requests snapshot from peer
2. populate ForkTree from peer(s) (build tree ourselves, just request blocks in no specific order)
3. verify we know latest_sidechain_block_confirmation or retry 2. other peer

RPC changes

new: sidechain_fetchAvailableBlockHashes: returns all block hashes in the fork tree (which is regularly pruned upon finalization)
refactor sidechain_fetchBlocksFromPeer: request set of blocks from peer specified by hash without assuming canonical range

Argumentation

inserting new blocks into ForkTree happens async and doesn't cost block production time (in contrast to building the ForkTree from ImportQueue synchronously upon execute_trusted_calls() )
this implicitly detects forks and the worker will never build a new block on top of a known fork branch

Alright, so far I agree with the status quo, and the general implementation, I have some low-level questions:

I don't really understand why prune is necessary after we finalize nodes. Shouldn't the re-rooted tree be pruned already?
I think we need to add that we must re-build our state (re-apply sidechain blocks), if the child of the longest chain changes after rebalancing. (However, the fork-tree is rebalanced automatically upon block import, so I am not sure if rebalancing has to be done at this step, but rebuilding the 'best' state is still needed IMO).
We should also mark in our db, what state is finalized, or we should only persist finalized state anyhow, probably.
under what circumstances can there be no valid child? When a node has been finalized that is not in our fork tree?
Why does finalizing nodes in the fork tree have to be in the execute_trusted_calls part? I think it is unrelated, and everything could be done when we realize that there is a new finalized sidechain block. However, it could be that your proposal is the lowest hanging fruit here. I need some more time to evaluate that, in the long run though, I definitely argue that execute trusted calls, just needs to know the best state, and should not care about forks.

I don't really understand why prune is necessary after we finalize nodes. Shouldn't the re-rooted tree be pruned already?

if it's like that, ok. Looking at the pub fn of ForkTree I expected this to be explicit. And I think explicit may be better because we may want to keep the blocks around a little longer for the benefit of peers syncing up

I think we need to add that we must re-build our state (re-apply sidechain blocks), if the child of the longest chain changes after rebalancing. (However, the fork-tree is rebalanced automatically upon block import, so I am not sure if rebalancing has to be done at this step, but rebuilding the 'best' state is still needed IMO).

we can't roll back. If rebalancing changes what we consider to be the longest chain, we need to reset on a snapshot and reapply blocks

We should also mark in our db, what state is finalized, or we should only persist finalized state anyhow, probably.

We should only snapshot finalized state

Persistance I'm not yet sure about. For simplicity, I suggest to snapshot every directly finalized block. Everything else (ForkTree and state after non-finalized blocks) might be kept in-mem

under what circumstances can there be no valid child? When a node has been finalized that is not in our fork tree?

that we may be able to sync from a peer. no valid child is the situation when - even after sync from peers - we have no finalized block or lineal descendant of such

Why does finalizing nodes in the fork tree have to be in the execute_trusted_calls part? I think it is unrelated, and everything could be done when we realize that there is a new finalized sidechain block. However, it could be that your proposal is the lowest hanging fruit here. I need some more time to evaluate that, in the long run though, I definitely argue that execute trusted calls, just needs to know the best state, and should not care about forks.

My design choice is pragmatism. I saw how it can fit there and I see no reason why it wouldn't be a good idea. It's even fit for the (potential) future case of more than one shard per validateer service

So the biggest chunks in terms of implementation efforts that can be estimated should be:

Replacing the ImportQueue with the ForkTree struct
Implement the RPCs according to above spec
Implement Finalize(latest_sidechain_block_confirmation):
- prune fork tree
- rebuild state if finalized sidechain block wasn't on current branch
- fetch missing blocks via RPC introduced in 2. if the finalized block is not even in the fork tree
Implement 'Initial behavior' from above.
Integration tests as a separate task: https://github.com/integritee-network/worker/issues/1592

integritee-network / worker