New Tortoise Vote-Encoding Scheme

barakshani commented 4 years ago

In the new protocol, blocks should vote on a much larger set of blocks. We should:

Decide how far back do we vote.
Construct an efficient encoding scheme for voting.

tal-m commented 4 years ago

The new encoding scheme leads to much more efficient self-healing; this supersedes #10 for now.

Overview

We can think of each block as containing an encoding of votes for blocks in all previous layers. Each vote is a value in ${+1,0,-1}$ , corresponding to considering the block valid, being neutral, or considering the block invalid. The encoding scheme takes a vector of values and outputs a string that encodes it (the encoding can depend on all previous blocks that were received). The corresponding decoder takes the string and the previously-received blocks, and outputs the vector of votes.

In our current scheme, the encoding is very cheap (we just record the "view" of each block) but decoding is expensive ($\Omega(n^3)$ to completely decode a mesh with $n$ blocks).

We suggest moving to a new, more efficient encoding scheme, in which decoding is only $O(n^2)$, and (in the optimistic case) block sizes are also smaller.

Suggested Scheme: Simple

The encoded vector consists of a pointer to a "base" block, along with a list of "exceptions". An exception is a tuple $(X,v_X)$ such that the vote vector at position $X$ contains the vote $V_x$, but the base block's vote for $X$ is different. Note that if $B$ is in layer $i$, it votes neutral for all blocks in layers greater than $i$. Thus, since the base block is always in an earlier layer (though not necessarily the one immediately previous), the exception list will almost always contain votes for the most recent layers (the exception is when the Hare protocol has not terminated yet for those layers).

The decoder is recursive: to decode the votes of block $B$:

Decode the votes of the base block $B'$. Let $V'$ be the output.
Initialize the output vector $V\gets V'$.
For every value (X,v_X) in the exception list, update $V[X]=v_x$.

Using dynamic programming, we can decode all the blocks in $O(n^2)$ memory and time: run the above algorithm for each block in sequence, but store the decoded vectors for all blocks once they are computed. Thus, step (1) takes constant time, so for every new block we use $O(n)$ time.

Potential optimization: sliding window

Just like the current Tortoise algorithm, we can keep only a sliding window of the votes for the last $k$ blocks rather than the full history. We set $k$ to be large enough that anything older will be in consensus and irreversible w.h.p.

We can also periodically rerun the algorithm with new data to ensure that we didn't miss anything due to the sliding window (see Verifying Tortoise below for efficient verification)

Suggested Scheme: DoS-prevention

The suggested scheme above encodes all votes perfectly (so our security proofs work without modification). However, the adversary can DoS the system by including arbitrarily-long exception lists. In order to prevent this, we'd like to limit the size of the exception list. If we do so, honest parties might not be able to encode their actual vote vectors (e.g., if every previous block in a party's view disagrees with that party on more than the exception-list limit).

We suggest the following workaround:

Limit the size of the exception list (we need to figure out what this limit should be).
If a party cannot encode its vote vector exactly (because it requires too many exceptions), it will do the following:
- Find a base block with the longest matching prefix
- Use its exception list to declare exceptions starting from the earliest one.
- Declare the first index at which the resulting vote vector has an "error" (i.e., the decoded vector does not match),
- Declare the requested size of the exception list that would allow exact encoding.
When a block declares that its encoding has errors, the decoded vote vector will be considered neutral for all positions after the first error.
A block will be allowed to created an extended exception list of length $L$, if it points to "enough" blocks (at least in the previous two clusters (i.e., previous 1600 blocks), such that the median of the requested exception list size is more than $L$ (a block that doesn't request a longer list is considered to request a list of the "regular" maximum length).

Analysis Ideas

If all honest blocks in a two-cluster sequence need a larger exception list (we use two clusters because we need to get closer to the 2/3 honest majority w.h.p), then the median will be larger, so the next set of blocks will be able to create larger lists. However, even if all malicious blocks request larger lists, the median won't change.

On the other hand, if there is an honest block that doesn't need a larger list, we're "ok" with other honest blocks using that as their base even with errors, since in our security analysis we don't care which honest block produced a block, just that it did.

This requires some more careful analysis

Suggested Implementation Guidelines

The scheme can be implemented in a very modular way: make sure the following are separate functions:

Choosing the base block
Computing the allowable exception list length
Choosing which exceptions to include in the list.

This will allow us to initially implement a simplified version (e.g., one that doesn't limit the length of the list at all), and later update the algorithm to add DoS-prevention features.

Verifying Tortoise

The "Verifying Tortoise" is an optimized algorithm that, given a proposed vote vector, can verify whether this would be the output of the full Tortoise algorithm much more efficiently than actually running the full Tortoise.

The following $O(n)$ algorithm will work as long as all assumptions were satisfied since genesis (this is basically the same condition that we require for the current "optimized Tortoise):

Mark the genesis block as "good"
Go over all blocks, in order.
- For block $i$, mark it "good" if (1) its base block is marked good and (2) its exception list consists only of blocks that appear after the base block and are consistent with the input vote vector.
Count the votes for the input vote vector by summing the weight of the good blocks (of course, each good block's weight only counts for the blocks preceding it).
Declare the vote vector "verified" up to position $k$ if the total weight exceeds the confidence threshold in all positions up to $k$.

We need more research to see if we can extend this algorithm (or come up with another one) that will work even if assumptions were not satisfied for some periods in history.

avive commented 4 years ago

Fast Tortoise verification

lrettig commented 3 years ago

@tal-m: re: this part:

Go over all blocks, in order. For block $i$, mark it "good" if (1) its base block is marked good and (2) its exception list consists only of blocks that appear after the base block and are consistent with the input vote vector.

Can a base block's exception list contain blocks in the same layer as the base block itself, or must they all be in strictly newer layers?

tal-m commented 3 years ago

ase block is marked good and (2) its exception list consists only of blocks that appear after the base block and are consistent with the input vote vector.

Can a base block's exception list contain blocks in the same layer as the base block itself, or must they all be in strictly newer layers?

A block never votes for blocks in the same layer, only for previous (strictly older) layers. The reason is that honest blocks in a layer are all published at the same time, so shouldn't know about each other until after they are published.

lrettig commented 3 years ago

Sorry, let me rephrase my question: can a block's exception list contain exceptions in the same layer as the base block? (The base block itself is, of course, in an earlier layer than the block being created.) This is unclear from the wording I quoted: it sounds as if exceptions can only be added in layers newer than the base block's layer, but not in its own layer.

tal-m commented 3 years ago

Sorry, I understand what you're asking now.

The exception list can include blocks from any layer before the block being generated, including the base-block layer. For example, suppose the base block B is in layer 10, and the new block N is in layer 15. The exception list in block N will contain:

Any blocks in layers 1...9 for which block B votes differently than block N
Any blocks in layers 10...14 that block N isn't voting against (i.e., it's voting for them or explicitly neutral).

The first rule is straightforward --- this is where the term "exception list" comes from. The second rule ensures that a block always votes against all blocks in previous layers that aren't in its view. If we didn't have this rule, a block that was published extremely late (e.g., a block claiming to be in layer 1 when the current layer is 1000) wouldn't have an overwhelming vote margin against it.

Note that block B, like every block in layer 10, is implicitly neutral about all blocks at layer 10 and up, while block N is implicitly neutral about blocks in layers 15 and up.

lrettig commented 3 years ago

Any blocks in layers 1...9 for which block B votes differently than block N

If new block N is in layer 1000 and base block B is in layer 995, how far back do we look to add exceptions when constructing block N?

tal-m commented 3 years ago

We look all the way back to genesis. Of course, we should choose a base block that agrees with us as much as possible.

In the extremely unlikely case that we can't find a base block that agrees with us on almost everything, we add blocks to the exception list starting at the oldest block on which we disagree, and stopping when we reach the maximum exception list size.

lrettig commented 3 years ago

Wouldn't looking back all the way to genesis every time a base block is selected (i.e., every time a new block is constructed) take too long? Don't we want to use the "sliding window" optimization described above?

Potential optimization: sliding window Just like the current Tortoise algorithm, we can keep only a sliding window of the votes for the last $k$ blocks rather than the full history. We set $k$ to be large enough that anything older will be in consensus and irreversible w.h.p.

tal-m commented 3 years ago

I didn't mean the algorithm should actually look back to genesis; We can optimize this (see below). But the exception list is allowed to contain exceptions arbitrarily far in the past.

In practice, the algorithm should work something like this:

For every block in the sliding window, we store:

its votes about every previous block in the sliding window a
The first layer at which the block disagrees with our opinion (this can be outside the sliding window).

When we search for a base block, I think the following heuristic would work:

sort blocks by the first disagreement layer, then by the block's layer.
Pick the best base block according to this sort order.
- Break ties arbitrarily (easy) or by computing the size of the exception list and choosing the shortest one (a bit better in terms of communication/storage complexity, but probably not worth it to implement)
- For example, if we have blocks A, B, C such that:
A is in layer 1000, B in layer 1005 and C in layer 1010, and
A agrees with us up to layer 990, B agrees with us up to layer 990 and C agrees with us up to layer 900, the sorting order would be B,A,C (since B and A both have a first disagreement at layer 991 while C has one at layer 901, and the tie between B and A is broken because B is more recent).

With this heuristic, you'd only have to look beyond the sliding window if we disagree with every recent block about some block beyond the sliding window (which is a sign the sliding window isn't really working).

Does this sound reasonable?

lrettig commented 3 years ago

How should we parameterize the size of the sliding window, especially compared to hdist? Can we use hdist? Should it be larger but the same order of magnitude? Or something else entirely?

lrettig commented 3 years ago

The exception list can include blocks from any layer before the block being generated, including the base-block layer. For example, suppose the base block B is in layer 10, and the new block N is in layer 15. The exception list in block N will contain:

Any blocks in layers 1...9 for which block B votes differently than block N

Any blocks in layers 10...14 that block N isn't voting against (i.e., it's voting for them or explicitly neutral).

According to the algorithm specified above:

Go over all blocks, in order. For block $i$, mark it "good" if (1) its base block is marked good and (2) its exception list consists only of blocks that appear after the base block and are consistent with the input vote vector.

block N would not be marked "good" if it contained exceptions for any blocks in layers 1...9 since (2) would be violated, and block N itself would never become a base block candidate, right?

Can a block with an exception in the same layer as its base block be marked "good"? Or must exceptions strictly be in layers later than the base block layer?

tal-m commented 3 years ago

IIRC, marking blocks as "good" only affects the verifying tortoise; the idea is that when our assumptions are satisfied, all honest blocks vote exactly the same, so all of them will be good, and it's enough to count only good blocks.

Specifically in answer to your questions:

block N would not be marked "good" if it contained exceptions for any blocks in layers 1...9 since (2) would be violated,

That's correct

block N itself would never become a base block candidate,

That's true if we're in the verifying tortoise regime. If we're self-healing, it's possible that no block is marked good, in which case we'll have to choose a base block that's the "least bad".

Can a block with an exception in the same layer as its base block be marked "good"?

Yes, (2) should say "in the same layer or after the base block". Otherwise no block would be marked good (except in the very unusual case where it votes against every block in the layer of its base block).

spacemeshos / research