Finalize blocks with a roll-up based strategy

clangenb commented 2 years ago

A while ago, we identified an issue that the parentchain finalizes a superset of sidechain blocks that the validateers deem valid, see https://github.com/integritee-network/pallets/issues/58. Initially we thought this is easily solvable by extracting the verification logic of the validateer and simply use it in the pallet too. However, it turns out that this introduces technical complexity, which we tried to avoid in the parentchain. Hence, we should think about roll-ups rather sooner than later. This is further motivated because our trust model will simplify roll-ups greatly.

Remember: Trust Model & Finalization

Our trust model assumes that there are no invalid/forged blocks. It only allows network-based attacks, i.e., blocks might arrive too late at certain validateers so that they have already started proposal for the block of slot n+x without having imported the block for slot n, where x >=1. This will introduce a fork. The parentchain's job is to resolve that fork by picking one of the branches. There are several strategies for picking a branch. It should be emphasized that there is no correct branch usually, but there are different strategies that might choose different branches. The most common strategies are first-comes-first-serves (i.e., the parentchain picks the first block with number n that it sees), or the longest-chain strategy.

As we produce many sidechain blocks per parentchain block (with 300ms ~ 50 blocks), we can think of patterns that do not send every block to the parentchain. Instead, we send a proof that entails the info that a subset of all validateers have agreed on certain sidechain blocks, aka a roll-up. Based upon the no-malicious-block-assumption (aka our workers are not byzantine), I theorize that we don't even need a classical roll-up in our trust model. It suffices that we prove that a fraction of validateers have agreed on a certain block.

Potential Solution

The most naive way to do this might simply be to sign every nth block with all validateers, and send it to the chain. The chain will finalize a block if it is signed by a threshold of validateers (majority, or supermajority?). However, can we prove in case of a partitioned network that the finalization, will ever return? We might achieve this by a threshold which decays exponentially the longer we are missing finalization until, ultimately, a single validateer is enough to finalize. This will guarantee liveness and eventual finalization of the chain. Hence, the expected latency of finalization is proportional to the fraction of validateers agreeing on a block. This intuition seems right for the problem we try to solve, and might hint that this is a sound approach.

Implementation

This would change the paradigm: Currently, we send a sidechain block to the parentchain after it was proposed, instead of after being successfully imported. This was the root cause of https://github.com/integritee-network/pallets/issues/58. So effectively, we would implement that every nth block, will be sent to the parentchain after a successful import by every validateer. The parentchain imports them and only verifies the signature, that the signer is part of the authority set, and puts the block into the storage:

NMap PendingFinalization<key1: shard: key2: SidechainBlockNumber, key3: SidechainBlockHash, value: SignatureCount>

As soon as SignatureCount > ValidateerCount*(1-some_fn(SidechainBlockNumber, LastFinalizedBlockNumber)), it will mark that block as finalized, finalizing all previous blocks too.

To-Do: work out some more implementation details.

Consequences

I argue that the computational complexity will only be marginally bigger than the current implementation of confirm_sidechain_block, yet we only need a fraction of the currently needed dispatches. Let's assume we have 5 validateers and 300 ms block time. We only need to trigger finalization once per parentchain block, more will have no benefit as finalization of block n, will finalize of blocks before n. Also, in the happy case, we have the same average expected finalization latency of 0.5 parentchain block production time, which is the lowest-possible value as long as we rely on the parentchain for finality.

Comparison: Number of tx's per parentchain block per sidechain:

Before: 1 for every produced sidechain block -> 50

After: 1 for every registered validateer -> 5

With my assumption that the complexity of the dispatchable is comparable, this would lead to an increase in throughput of a factor of 10.

Summary

10x TPS increase because many more sidechains can be processed
does not increase parentchain verification complexity
changes in worker are very straight-forward, and might even simplify the code. The new approach does not need more worker<>worker communication.

To-Do:

How to handle different sidechain block production times
There might be a more effective storage model for the pallet
High-level pseudocode of the pallet logic.
Verify soundness of the algorithm for some scenarios.

Tasks

[x] integritee-network/worker#922
[x] integritee-network/worker#1003

brenzi commented 2 years ago

Nice argumentation. I think your suggestion would be easy to implement and effective. However, it might not be optimal in terms of finalization latency because finalizing strictly every n-th block introduces a jitter of one entire parentchain block if n is equal to pc blocktime/sc blocktime. It could frequently happen that one pc block delivers no finalization because not all confirmations were included.

This could be solved by a different approach: Whenever a pc block is imported, the next sc block will be marked as a finality candidate and every validateer confirms its import with a pc xt. This way, there should be strictly one scb finalized for every pcb...unless blocks are full

brenzi commented 2 years ago

If we do not need to assume byzantine validateers, a simple majority of validateers should be sufficient AFAIK

clangenb commented 2 years ago

Note: This strategy and its implementation of deciding the sidechain block finality candidate has been moved to a separate issue: integritee-network/worker#1003

Ok, I see the point with jitter, and I see good potential in using the parentchain block as input for choosing the candidate. However, choosing the next sc block as candidate, will introduce an average expected parentchain-block inclusion latency of 1.5 pc blocks, which is the worst you can get with this approach. We should rather choose the next sc block candidate based on sidechain slot duration and parentchain block production time so that we choose a sc block candidate in the future, which is very likely to make it into the parentchain block, i.e.:

let next_parentchain_block_timestamp = somehow_get_timestamp_from_last_block() + parentchain_block_production_time;
let remaining_time = next_parentchain_block_timestamp - now();

// skip this slot if we don't have enough time.
let next_target_parentchain_block_timestamp  = if remaining_time > sidechain_block_production_time / 5 {
    next_parentchain_block_time_stamp
} else {
    next_parentchain_block_timestamp + parentchain_production_time - sidechain_block_production_time / 5;
}

let remaining_sidechain_blocks = (next_target_parentchain_block_timestamp - now) / sidechain_block_production_time;

let scb_finality_candidate = scb_current + 
    floor(
         remaining_sidechain_blocks * security_factor
    );

Where the security_factor < 1 can be tuned to adjust the probability that thescb_finality_candidate does indeed make it into the next parentchain block.

This will reduce the expected parentchain-block inclusion latency to around 0.5 parentchain block production time depending on the exact value of the security factor.

Edit: Corrected above; it should not be average expected finality latency, but average expected parentchain-block-inclusion latency.

clangenb commented 2 years ago

@brenzi We have discussed that we might only need one worker to submit a sidechain_block_finality candidate. However, I now think this is attackable. If a worker sends its block to the chain, but the malicious operator blocks the gossiping to fellow validateers, and goes offline afterwards, the sidechain will stall. So either:

We store enough information onchain to be able to recreate the sidechain state (seems very unlikely to me that we find such a solution)
We indeed need to have multiple validateers signing the same block, such that enough validateers have access to the state.

brenzi commented 9 months ago

"premature optimization is the root of all evil" (Knuth) before we care about latency and maximal throughput, we should implement the minimal solution that is functional and safe and guarantees lifeness.

I therefore suggest https://github.com/integritee-network/pallets/issues/242

integritee-network / worker