stacks-network / stacks-core

The Stacks blockchain implementation
https://docs.stacks.co
GNU General Public License v3.0
3k stars 659 forks source link

Absent PoX anchors should be "forkable" #1805

Closed kantai closed 1 year ago

kantai commented 3 years ago

There are a few cases where Stacks blocks should be processable in multiple PoX forks. For example, if a reward cycle begins where nobody Stacked, then all of the block commits of that reward cycle would be burns, regardless of whether or not the anchor block has already been processed by the node. However, the node cannot know that this is the case until it has processed that anchor block. This would result in a set of sortitions that would select the same sortition winners in two different PoX forks. It is vitally important that those sortition winners be valid in either PoX fork. If they are not valid in either PoX fork, that would mean that any missing PoX anchor block essentially stalls the blockchain (because the discovery of that anchor block would always invalidate all subsequent blocks).

There are a handful of implementation details and Clarity features that prevent blocks from being valid in multiple PoX forks:

  1. The write_consensus_bytes implementation for back pointers uses the referred-to MARFTrieID as the consensus bytes for the back pointer. This could instead use the root hash of the referred-to trie. See: https://github.com/blockstack/stacks-blockchain/blob/cb75c8f9a555aff85eafc068909d9d573941775d/src/chainstate/stacks/index/node.rs#L311

  2. The BLOCK_HEIGHT_TO_HASH_MAPPING_KEY and BLOCK_HASH_TO_HEIGHT_MAPPING_KEY key/values stored in each MARF trie. These store the parent "MarfTrieId" in each trie. Because the MarfTrieId is a function of the consensus hash and the Stacks block hash, this mixes the PoX fork identifier into the root hash. These key/value pairs are necessary for a bunch of implementation reasons-- they are how we evaluate (at-block ...), compute the ancestor hash skip list, among other things. However, these probably do not need to be committed to. If we could find a way to track these without storing them in the Stacks block commitment MARF, that would be very helpful.

  3. Right now: exposing the index-block-hash in Clarity. Eventually: exposing other PoX reward cycle information.

jcnelson commented 3 years ago

Regarding (1), I think that, by itself, this change should be fine. But, please see (2) and (3) below, since they may influence (1).

Regarding (2), could we do the following?

Regarding (3), I'm on the fence about exposing PoX state to Clarity. On the one hand, I don't want to hide any useful information from developers. But on the other hand, I don't know if the existence of PoX state in the VM can cause a block that would otherwise be valid in multiple PoX forks to now be invalid. For example, if we expose the consensus hash to the Clarity VM, would this re-introduce the problem, because now the VM could commit state to the block's trie that depends on the PoX view?

kantai commented 3 years ago

Change the mapping to store the parent trie root hash instead of the MarfTrieId? Then, we can still make use of the BLOCK_HEIGHT_TO_HASH_MAPPING_KEY and BLOCK_HASH_TO_HEIGHT_MAPPING_KEY key/value pairs; it's just that now they would not be dependent on any PoX state.

Maybe -- the problem is that we use those mappings to get pointers to the trie. If we then have a way to go from trie root hash to marf-trie-id, then we should be able to do this.

Have the MARF commit to all other non-PoX information that would have been committed to by the Stacks index block hash (which is what we currently use for the MarfTrieId)? A trivial way to do this would be to insert one extra key/value pair into the MARF that mapped the string "__BLOCK_HEADER_INFO" to the canonical serialization of the Stacks block header, with a state root hash of all 0's.

Yep, that should be doable.

Regarding (3), I'm on the fence about exposing PoX state to Clarity. On the one hand, I don't want to hide any useful information from developers. But on the other hand, I don't know if the existence of PoX state in the VM can cause a block that would otherwise be valid in multiple PoX forks to now be invalid. For example, if we expose the consensus hash to the Clarity VM, would this re-introduce the problem?

Yep, if we exposed the consensus hash, it would re-introduce the problem -- I could write a contract that just exposed one public function for issuing a "consensus-binding" transaction, storing the most recent consensus hash into a data-var. At the start of every block, I issue that transaction. In the event of a PoX fork, any block that included that transaction would now be invalid.

jcnelson commented 3 years ago

Maybe -- the problem is that we use those mappings to get pointers to the trie. If we then have a way to go from trie root hash to marf-trie-id, then we should be able to do this.

I think this is could just be a sqlite table that doesn't get committed to by the trie root hash. All this table would do is map the (PoX-less) global trie identifier back onto the (PoX-specific) local trie identifier. But, I'm leery about any scheme that does not force the trie to commit to the values of BLOCK_HEIGHT_TO_HASH_MAPPING_KEY and BLOCK_HASH_TO_HEIGHT_MAPPING_KEY, since I think (a) light clients are going to need to authenticate this information using only block headers (and this is trivial if these keys are just committed to already by the block headers' state root hashes), and (b) as a general design principle, I think Clarity should only ever read state that is either authenticated by the state root, or is deterministically derived in a PoX-free manner from authenticated state (such as contract metadata).

Yep, if we exposed the consensus hash, it would re-introduce the problem -- I could write a contract that just exposed one public function for issuing a "consensus-binding" transaction, storing the most recent consensus hash into a data-var. At the start of every block, I issue that transaction. In the event of a PoX fork, any block that included that transaction would now be invalid.

Sounds good. I'll go ahead and remove the consensus-hash global variable I introduced in my PoX lockup PR.

jcnelson commented 3 years ago

@kantai I'm still a little foggy about the precise reason why PoX state is not allowed to be represented in the Clarity VM. We don't want to expose the consensus hash, because that could render a block that needs to be valid in two PoX forks invalid in at least one of them. Is the fundamental reason for this because it allows the same block to evaluate to two different state-roots due to the transactions having a PoX-specific interpretation? I'm asking because I'm trying to reason about how lazy Stacks token unlocks in PoX will affect block evaluation. I think this concern doesn't apply here, because PoX lock state is the same in all PoX forks (and thus a block that tries to lock or unlock tokens will have the same interpretation in all PoX forks). Is that reasoning correct?

kantai commented 3 years ago

Is the fundamental reason for this because it allows the same block to evaluate to two different state-roots due to the transactions having a PoX-specific interpretation?

Yep, exactly.

I'm asking because I'm trying to reason about how lazy Stacks token unlocks in PoX will affect block evaluation. I think this concern doesn't apply here, because PoX lock state is the same in all PoX forks (and thus a block that tries to lock or unlock tokens will have the same interpretation in all PoX forks). Is that reasoning correct?

Yeah, that's correct.

diwakergupta commented 3 years ago

Note from discussion on 8/17: doesn't necessarily block PoX on Krypton, but would be a mainnet blocker.

jcnelson commented 3 years ago

@kantai Do you know the status of this issue?

kantai commented 3 years ago

Yep, this is still an open issue

kantai commented 3 years ago

Circling back on this, I think there's two potential ways to address this issue:

1. Continue to uniquely identify Stacks block tries with a function of BlockHeaderHash and ConsensusHash.

Most of the code outside of the MARF implementation would continue to use StacksBlockId struct as currently implemented to index into the MARF. However, the MARF would need to use a different identifier internally. This would impact the following:

  1. The root hash skip-list calculation. This would need to use the BlockHeaderHash instead.
  2. Back-pointer consensus bytes in MARF proofs. Again, rather than the StacksBlockId, it would use BlockHeaderHash.
  3. Back-pointer traversal. Currently, MARF tries store a key/value entry in each block that resolves block height to a trie identifier. This would need to either use the BlockHeaderHash to perform the traversal (by implementing some method of getting a StacksBlockId from a BlockHeaderHash while indexed into a MARF "fork", or resurrecting something like the fork table struct. This would be far and away the most painful part of this change.

2. Revert to uniquely identify Stacks block tries with a function of BlockHeaderHash and BurnchainHeaderHash.

At the surface, this seems like it's not possible (or if it is, it's suprising). But, I believe that this should be possible, even in the presence of PoX forks. This is because a winning BlockHeaderHash/BurnchainHeaderHash will always refer to a single possible MARF trie even if a PoX fork occurs, however, the block data would need to be revalidated (because one of its parents may not have been elected). Doing this revalidation would likely require dropping invalidated trie (or otherwise marking as invalid) during a PoX reorg.

From my best understanding at the moment, the difference between these two approaches is roughly that (1) implicitly enforces the revalidation of blocks by continuing to externally index by ConsensusHash/BlockHeaderHash and internally resolving backpointers with a ConsensusHash/BlockHeaderHash lookup data structure, whereas (2) does not require the same indirection because it would be explicitly dropping Stacks block data associated with a PoX reorg.

I'm currently more inclined towards option (2), because I believe it would lead to less indirection in the normal/expected path, wouldn't require introducing a new data structure (which is a source of some implementation uncertainty), and it cleans out data that we should be dropping anyways. I'd love to hear thoughts from others on these options, though, and also if you can think of any other strategies.

jcnelson commented 3 years ago

Popping up a level, I want to reconsider this:

For example, if a reward cycle begins where nobody Stacked, then all of the block commits of that reward cycle would be burns, regardless of whether or not the anchor block has already been processed by the node. However, the node cannot know that this is the case until it has processed that anchor block. This would result in a set of sortitions that would select the same sortition winners in two different PoX forks. It is vitally important that those sortition winners be valid in either PoX fork.

In each reward cycle, there are effectively two sets of forks: the forks created by miners who have the anchor block (call these "PoX-descendant forks"), and the forks created by miners who do not (call these "nephew forks"). Right now, the danger is that a miner can mine a hidden anchor block, build on it privately, and release the anchor block at an arbitrarily far point in the future. Today, this would invalidate all other forks produced -- i.e. all nephew forks created by miners who burned because they believed the anchor block did not exist would get blown away. This is obviously bad -- it's not prohibitively expensive to pull this attack off, and it can be used by a financially-motivated attacker to blackmail users, miners, and exchanges later (i.e. "pay me or I invalidate the last year's worth of transactions"). The payout for the attacker would likely increase with time, so it's worth it for the attacker to be patient.

Per SIP-007, when a set of miners doesn't receive the anchor block, it recovers by burning BTC instead of transferring it -- these miners build one or more "nephew forks" that burn BTC during this reward cycle, since they descend from an uncle of the anchor block. What we do not want to have happen is for the nephew fork(s) produced by these nodes to get invalidated by a later-arriving anchor block if they are sufficiently supported by the network. That is, the network ought to continue to treat these nephew forks as valid if more BTC was burned for nephew forks than was transferred for forks descending from the hidden anchor block. This way, a PoX-descendant fork set that descends from a hidden anchor block will only be considered as valid if it also has majority miner support -- i.e. it represents more cumulative BTC transferred than the nephew fork set represents cumulative BTC burned. This additional constraint makes pulling off a hidden anchor block reorg attack at least as expensive as a 51% attack -- if a hidden anchor block attack plus a deep reorg attack did take place, then attacker would already have the power to reorg the chain however (s)he saw fit anyway.

We already store enough information in the sortition DB to do this. We would need to make the following changes:

Point (3) is required because it means that the only way a malicious miner who mines hidden anchor blocks can keep stalling the chain is to also transfer more BTC in a descendant PoX fork than all nephew forks combined during reward cycle N. This is tantamount to forcing this miner to mount a 51% attack. Moreover, if the attacker loses and a nephew fork out-burns it during reward cycle N, then the attacker must re-start the attack at reward cycle N+1 since its anchor block will not be considered by any node even if it gets released.

Per point (2), note that this means that if the PoX-descendant fork-set and the nephew fork-set have roughly equal mining power, then the chain will get reorged intermittently back to the start of N. I consider this unlikely -- network participants can, through user-burn-supports, quickly step in and help honest miners overcome a hidden anchor block attacker. Also, because all sortitions are public knowledge, every network participant -- including wallets and exchanges -- can see this attack happening, and take proactive security measures like increasing the number of required confirmations or temporarily halting trading. So, the attacker would not have much to gain from triggering these intermittent re-orgs.

Due to point (3), the ability to re-org a reward cycle's forks goes away if honest miners are able to produce a nephew fork-set with more BTC burnt (in which case, the best chain is the longest chain in the nephew fork-set). Even if an attacker is able to repeatedly mine hidden anchor blocks for reward cycles N up to N + k, the remaining miners can recover once the attack subsides by building a fork that descends from the best chain tip at the start of N in reward cycle N + k + 1. Note that all miners and nodes that don't have the anchor block for N will have always considered this chain tip to be the best chain tip for the past k + 1 cycles, since they will not have been able to process any PoX-descendant forks. If the attacker releases the anchor block at all, then it can only reorg the chain up to k + 1 reward cycles (but at the cost of having to constantly out-transfer all nephew-miners).

jcnelson commented 3 years ago

(oops, fat-fingered the "comment" button).

Anyway, the last thing I'll say is that another advantage of trying to make the network commit to a nephew fork versus a PoX-descendant fork by measuring burn/transfer output would be a far less disruptive change than trying to make the MARF block commitments not depend on a PoX fork. I also think that the implementation would be somewhat straightforward and wouldn't require any new data structures to be invented.

kantai commented 3 years ago
  1. When processing sortition S in reward cycle N, a node considers how much BTC was burned in all nephew forks in this reward cycle (i.e. sum of all BTC burned on nephew block-commits in N up to and including S) versus how much BTC was transferred in all PoX-descendant forks (i.e. sum of all BTC transferred in PoX-descendant block-commits in N up to and including S). If more BTC has been burned than transferred, it only considers valid the sortitions from the nephew block-commits up to and including S. If more BTC has been transferred, it only considers valid the sortitions from just the PoX-descendant block-commits.

  2. Once reward cycle N+1 begins, the miners will have decided whether or not reward cycle N has a valid anchor block. If the nephew fork set represents more BTC burnt than the PoX-descendant fork set represents BTC transferred, then it doesn't matter if the anchor block arrives -- it will be considered invalid, and no forks may be built on it. The PoX bit for this reward cycle is set to 1, and no anchor block is selected. If on the other hand more BTC has been transferred, then the nodes decide that there was an anchor block in reward cycle N and all nephew forks built on during N are invalidated. The PoX bit in reward cycle N is set to 1 for nodes that have the anchor block, and 0 for nodes that do not. The cumulative weight measurement for nephew forks versus PoX-descendant forks resets at the start of N + 1.

I agree that (2) makes this attack more difficult/unlikely, because it would mean that a malicious miner would need to secretly mine a PoX anchor block (non-trivial, need to win a super-majority of blocks during the prepare phase), and then also provide a majority of the mining power during the subsequent reward cycle. However, I don't think that (3) addresses the "long-range attack" problem for the following case:

  1. Malicious miner gets a PoX anchor block A confirmed for reward cycle N which is hidden to the honest network. Requires winning a super-majority of blocks during Prepare.
  2. Malicious miner mines a majority of commits during the corresponding reward cycle N.

Honest miners would like to disqualify A from consideration after reward cycle N has passed, but they cannot. The network is now in as tenuous of a position as before: honest miners would like to continue making progress, but revealing A could disqualify their fork in all cases.

jcnelson commented 3 years ago

Honest miners would like to disqualify A from consideration after reward cycle N has passed, but they cannot. The network is now in as tenuous of a position as before: honest miners would like to continue making progress, but revealing A could disqualify their fork in all cases.

If malicious miner M can produce private anchor block A, and consistently control a majority of the mining power such that the best known fork is M's private fork built on A, then there's not much the rest of the network can do. This is true even without PoX. An analogous attack on Bitcoin would be for M to produce the heaviest Bitcoin fork, but only release the block headers (thereby proving that the fork exists, but without allowing anyone to build on it except for empty blocks).

How would honest miners disqualify A from consideration if the best known fork descending from A represented more mining power than any other fork? Isn't this equivalent to saying that any minority nephew fork can overpower a majority PoX-descendant fork if enough of a minority simply decides that A doesn't exist? We could conceivably pick a threshold at which a minority nephew fork disables PoX. For example, we could say that if any nephew fork gets 25% or more mining power, then all forks built off of A are discarded. But, this would just make the system vulnerable to a 26% attacker mining a hidden nephew fork (i.e. we see the nephew fork sortitions, but not the blocks).

I'm not sure we can do better than what I've described -- if a malicious miner can consistently out-mine everyone else to build a private fork off of A, then the chain is already lost.

kantai commented 3 years ago

If malicious miner M can produce private anchor block A, and consistently control a majority of the mining power such that the best known fork is M's private fork built on A, then there's not much the rest of the network can do. This is true even without PoX. An analogous attack on Bitcoin would be for M to produce the heaviest Bitcoin fork, but only release the block headers (thereby proving that the fork exists, but without allowing anyone to build on it except for empty blocks).

The point is that the miner M only needs to do this for a window of time to permanently disrupt the chain -- they need to win (and hide) an anchor block A, and then have the majority of burns in that block's associated reward cycle. After doing so, without ever revealing the anchor block, the chain is permanently in a state that it could be forked simply by revealing the block A.

jcnelson commented 3 years ago

However, I don't think that (3) addresses the "long-range attack" problem for the following case:

I'm not sure I communicated this well enough in my point (2). Specifically, I'm saying that nodes don't consider a sortition's BTC burned/xfered in a fork's cumulative BTC weight until they have the Stacks block for it. So, if M mines a private fork, the rest of the network still considers the nephew fork to be the best fork, since it's the best fork for which they have data. Another node won't count M's sortitions towards M's fork until it has M's fork's block data.

That said, now suppose malicious miner M produces a private anchor block A in reward cycle N, and then produces the heaviest fork in reward cycle N. This fork is private for now -- only M sees the block contents, and it descends from A.

Suppose that M ceases the attack in reward cycle N+1. Then, the rest of the network spends reward cycle N+1 building a nephew fork "in parallel" to M's private fork. Then, one of the following can happen:

I think this version of (2) means that in all scenarios, in order for M to reorg the chain, it needs to consistently control the majority of the mining power. This is what we want -- M can't reorg the chain unless M has majority mining power.

kantai commented 3 years ago

Ah -- I think I see.

So in this case, when we consider whether or not to reorg a "PoX fork", we consider the burn weight of the two PoX forks. I think this would mean that we cannot invalidate PoX forks as we currently do-- a PoX fork may be invalid when considered at block height N, but it is valid at block height N+k (because it has overtaken the other fork in burn weight).

jcnelson commented 3 years ago

The fundamental problem here is that there exists a way for an attacker to do O(1) work (i.e. to produce a private anchor block), and in doing so, can force the network to do Theta(n) work later (i.e. by releasing it). What we want is a mining protocol whereby a would-be attacker must always do Theta(n) work in order to compel the network to do Theta(n) work (or in other words, the attacker must do O(1) work per block in the attacker fork -- each block the network processes is paid for by sufficient mining power).

Without PoX, the Stacks blockchain already has this property. Like all forkable blockchains, an attacker that wants to force the network to re-process Theta(n) blocks must first mine a private fork of Theta(n) blocks. But with PoX, the attacker can mine a single hidden anchor block and release it arbitrarily far into the future, triggering a deep reorg of Theta(n) blocks.

How can we stop this from happening in PoX? We stop a similar problem in the Stacks blockchain by permitting forks: just because a block is missing and released late doesn't cause the subsequent chain to be reorged. Instead, miners who don't have a missing block simply mine on a different fork. The attacker can mine its own fork if it wants, but it will have to do Theta(n) work to produce a fork of Theta(n) blocks. This means that the only way a late block can later become part of the canonical chain history is if the majority of the network mines a fork that includes it.

In PoX, in order to stop a late anchor block from disrupting the Stacks chain in a similar way, the Stacks chain needs to maintain a fork history of anchor block decisions. An anchor block decision is a collective decision made by miners during a reward cycle to classify an anchor block as being in one of two states: the anchor block was either known to the majority of miners during the reward cycle, or the anchor block was unknown to the majority of miners during the reward cycle (this obviously does not apply to reward cycles without anchor blocks). Miners affirm past anchor block decisions by attaching the current reward cycle to a parent reward cycle, and in doing so, affirms the decisions made in all ancestral anchor blocks.

Reward cycle forks now represent a history of decisions made on the states of each ancestor's anchor block. The canonical reward cycle chain is the longest reward cycle chain, and the canonical Stacks chain is the longest Stacks chain that resides on the canonical reward cycle fork.

Breaking this down, I think the system needs to track three sets of forks:

Reward Cycle Forks

Reward cycles and prepare phases would continue to operate at fixed intervals of burnchain blocks in which PoX (or PoB) happen, just as they do today. But now, miners will create a fork history of reward cycles. The Nth reward cycle is no longer required to descend from reward cycle N - 1, but may descend from any reward cycle from 0 up to N - 1. Through a majority vote, miners in reward cycle N decide which reward cycle will be its parent (see below). This decision-making already happens implicitly; we just need to track it so we can enforce the best-Stacks-fork rule above.

Reward cycle forks are central to preventing a late anchor block from triggering more than O(1) processing work. A reward cycle's sortitions and blocks will never be re-processed if an anchor block was chosen but miners decided it did not arrive in time. But at the same time, the network must be able to recover if the majority of the mining power decides that an anchor block did arrive, even though it remains private (or gets lost). Creating a fork history of the anchor block decisions achieves this.

Because the same burnchain blocks can encode multiple reward cycles on different reward cycle forks, it is important to only process an anchor block if the majority of the network currently believes it exists. This is necessary in order to prevent an attacker from being able to get nodes to re-process sortitions on minority reward cycle forks. To achieve this, nodes would need to adhere to the following strategies when considering an anchor block:

Breaking this down, we can implement these two strategies by doing the following:

Choosing the Parent Reward Cycle

To create a fork history of reward cycles, we need to determine which reward cycle is the parent reward cycle of reward cycle N. To do so, the node examines each block mined reward cycle N, and determines its highest ancestor that is not in reward cycle N (i.e. via the last_ancestor_of_block_before() algorithm in SIP-007). The reward cycle of this highest ancestor is some reward cycle N - k, and the act of mining a descendant of this block is a vote for reward cycle N - k to be the parent of reward cycle N.

Simply by mining blocks, miners vote on what the parent reward cycle should be. If a majority of blocks in reward cycle N all descend from highest ancestors in the same reward reward cycle N - k, then the parent of N is N - k. If no majority can be found, then reward cycle N's parent is reward cycle 0.

Deciding the Fate of a Late Anchor Block

In addition to voting on which reward cycle should be the parent of N, miners vote to confirm whether or not the network had the anchor block for N while N was being mined. This already happens implicitly -- if a miner does not have the anchor block for N, it can still mine through PoB during N by building a "nephew chain" that descends from an uncle of the anchor block. In doing so, miners are implicitly casting votes on whether or not N's anchor block existed at the time.

We can track this vote to prevent a late anchor block from forcing a deep chain reorg. Recall that one of three things can happen when processing reward cycle N:

If the reward cycle has an anchor block chosen, miners have to decide during the reward cycle whether or not it falls under cases (2) or (3) above. If it falls under case (3), then miners send PoB block-commits with an additional bit set in them to indicate that the reason the miner is sending a PoB block-commit is because it doesn't have this reward cycle's anchor block. Then, when a node begins to process the sortitions this reward cycle, it will see whether or not a majority or a minority of the winning sortitions are PoB sortitions with this "anchor-block-missing" bit set. If it's a majority, then case (3) applies to this a reward cycle, and the node will never accept an anchor block for it. If it's a minority, then case (2) applies and the node will accept an anchor block for it even if it arrives late, but only as long as this reward cycle is on the canonical reward cycle fork.

Recovering from a Long 51% Attack

Both of the above strategies are required to handle the case where a malicious miner is able to successively create a private anchor block for reward cycles N - k through N, and mine a majority of blocks in N - k through N through PoX sortitions. If such a miner exists, the miner was necessarily a 51% attacker for reward cycles N - k through N. The two strategies proposed here give the honest majority of the network a chance to recover and orphan the attacker's chain once the attack stops.

To do so, the honest majority of the network uses reward cycle N + 1 to begin building a nephew Stacks chain that will overtake the attacker's chain. The miners declare reward cycle N + 1 to be a child of reward cycle N - k - 1. Then, miners declare that reward cycle N + 2 is a child of reward cycle N + 1, that N + 3 is a child of N + 2, etc. until the fork of reward cycles 0, 1, ..., N - k - 2, N - k - 1, N + 1, N + 2, ..., N + k + 1 is one reward cycle longer than the attacker fork 0, 1, ..., N - k - 2, N - k, N - k - 1, ..., N - 1, N. At this point, the longest Stacks fork represented in reward cycle N + k + 1 becomes the canonical Stacks chain.

Once the reward cycle fork N + k + 1 overtakes the attacker fork at N, the attacker cannot force a deep reorg by releasing its private anchor blocks and descendants. This is because the attacker's fork now resides on a non-canonical reward cycle fork. However, the attacker is able to force deep reorgs back to the anchor block of N - k by releasing anchor blocks and their descendants while the network is recovering. But, the attacker can only reorg reward cycles in which it had a majority of the mining power, which means that instigating a deep reorg of Theta(n) blocks requires the attacker to do Theta(n) work.

kantai commented 3 years ago

I think this proposed plan sounds like a good one! With this in place, we don't need to ensure that MARF block commitments don't depend on PoX forks.

jcnelson commented 3 years ago

Had a call with @kantai about this. Upon further reflection, this scheme cannot work by itself -- there's no way for a bootstrapping node to know in advance whether or not a reward cycle is going to end up on the canonical reward-cycle fork. This is because in order to determine if a reward cycle is canonical, it needs to know the length of all descendant reward cycle forks. It cannot calculate this incrementally, because the correct interpretation of this reward cycle's sortitions that lead it to discovering the canonical reward cycle fork may require interpreting the sortitions in the current reward cycle made by the network minority (but which will later be confirmed by the network majority). So in order to calculate this, the node needs to have some "future knowledge" -- namely, it needs to know exactly which future anchor blocks will be rejected or accepted as a consequence of accepting/rejecting this current anchor block. There's no reliable way to propagate this information with just the above proposal.

An alternative approach proposed by @kantai would be to require that the prepare phase of each reward cycle always be proof-of-burn, and require that the prepare phase and only the prepare phase sortitions vote on whether or not to accept or reject the current anchor block, and which reward cycle shall be the parent reward cycle of the current reward cycle. This would make it so that a bootstrapping node would always be able to extract the accept/reject history and fork history of all reward cycles regardless of which Stacks blocks it already has.

However, implementing and testing the above would likely take more time than we have (two or more weeks at least), so this issue might need to be addressed in Stacks 2.1. A simpler strategy for Stacks 2.0 could be to simply implement an on-chain permanent-kill switch for PoX, which liquid STX holders activate in the unlikely-but-catastrophic event that a PoX anchor block goes missing for too long. We could remove the kill switch in 2.1 when this fork-recover mechanism is ready.

jcnelson commented 3 years ago

I spent some time working through the solution space a bit more. I think the constraints we're working with are as follows:

Point (2) means that if there is some kind of on-Bitcoin voting procedure to reject the anchor block, then the voting must be done through PoB, not PoX. The vote must be conducted during some pre-determined, dedicated block-races that are PoB-only.

We aren't limited to using a miner-driven voting procedure to handle missing anchor blocks. Yet another alternative solution space is to implement a majority-vote among a set of trusted keys, for example, which are entrusted to a set of principals whose sole responsibility is to identify missing anchor blocks. These key-holders can overrule miners and declare an anchor block missing. I'm leaving out a lot of important details on how these keys get distributed and how to choose key-holders, but I bring this up as a possible solution that doesn't involve significantly altering how PoX works today.

jcnelson commented 3 years ago

Wrapping up notes from the architecture meeting, the consensus going forward is to attack this problem in two phases:

  1. Before code-freeze, make it so that the prepare phase is 100 blocks, and is always PoB. If we can do this, then we can implement the rest of this issue as a soft fork. Will open a separate issue for this.
  2. Once (1) is done, have the node's sortition DB keep track of a fork history of reward cycles, and use that to determine which anchor blocks to unconditionally ignore (i.e. non-canonical reward cycles are always treated as PoB). It would be wonderful, but not required, for this to be ready by January 14.
muneeb-ali commented 3 years ago

It was great to discuss this live in today's call. I'm supportive of #2157 with proof-of-burn only prepare phase and trying to make the prepare phase shorter.

jcnelson commented 3 years ago

Wrote up a possible solution to this problem here: https://gist.github.com/jcnelson/b1aa4bef8b9adb0856b28d3a933ef9a0

jcnelson commented 1 year ago

Merged!