stacks-network / stacks-core

The Stacks blockchain implementation
https://docs.stacks.co
GNU General Public License v3.0
3.01k stars 671 forks source link

[Network] block downloader gets stuck when the PoX anchor block is in the last unconfirmed tenure in the reward cycle #5174

Closed jcnelson closed 2 months ago

jcnelson commented 2 months ago

In the event that the last tenure in the reward cycle contains the PoX anchor block, the block downloader will get stuck because it never switches to Unconfirmed mode. As such:

jcnelson commented 2 months ago

So, even though we saw this lead to a freeze of one of the testnets, I'm not convinced that this is a problem that should be fixed. The situation arises if there is only a single tenure mined in the prepare-phase [1]. But, if you're a bootstrapping node and only have this single commit to go off of to load the tenure which contains the PoX anchor block (which is the first block mined in the prepare phase), there's basically nothing you can do to authenticate this block via Bitcoin-hosted block-commits. This is because the block-commit tells you the tenure-start block hash of the prior tenure. Thus, if you want to fetch and authenticate the PoX anchor block in the prepare phase, then there must be a subsequent block-commit in the prepare phase. That subsequent block-commit will contain the hash of the PoX anchor block, which the downloader uses to discover, download, and authenticate it as a confirmed tenure.

Now, the downloader will also fetch unconfirmed tenures -- namely, the ongoing tenure, which does not yet have a block-commit to confirm its start or end blocks. Both the unconfirmed downloader and block-push relayer authenticate newly-minted blocks against the active signing set instead of block-commits, which is acceptable because we already assume that 70% of the signing set is honest during its tenure. But what we saw in this particular testnet was a partial shutdown, not a full shutdown, because (1) miners produced a single block-commit in the prepare phase, and (2) some nodes received the PoX anchor block pushed by the stackers and miner. The nodes that received it were able to process it and store it because they did so while the current signer set was active. For reasons that are not clear to me since the log data has been lost, the nodes which did not receive the pushed block fell behind the Bitcoin chain, and did not resume processing the remaining prepare phase sortitions until after the next reward cycle had begun (perhaps the nodes were rebooted?). This means that they never had a chance to fetch the PoX anchor block via the unconfirmed tenure downloader. Using the unconfirmed tenure downloader is not an option in this situation, because in order for the unconfirmed tenure downloader to run, the node's highest sortition must also match its highest burnchain block. This constraint is required to ensure that the signing sets which signed the first and last blocks of a tenure are (1) known to the node prior to block download, and (2) authenticated against Bitcoin state [2].

I'm inclined to just close this issue as EWONTFIX. Even if we had all the time in the world to make it so that nodes could somehow fetch the PoX anchor block in this situation, the fact that they won't be able to authenticate it against Bitcoin chain state is troubling. The best a bootstrapping node could do is hope that the PoX anchor block is valid, and then hope that the subsequent block-commits it processes are the real block-commits (which can now only be trusted to be authentic to the extent that we trust the non-Bitcoin-attested PoX anchor block to be authentic).

Thoughts @kantai @obycode?


[1] This is exceedingly unlikely to happen in mainnet, since it would mean that there were no miners for 100 Bitcoin blocks. We would have far bigger problems if this was true.

[2] The reason we require the node to authenticate the PoX anchor block against a Bitcoin transaction is because we can't guarantee that a signers' private keys will remain secret in perpetuity. Suppose for example that your node is processing tenures from last year, and that since those tenures were mined, over 70% of signers' keys had been disclosed to an attacker. The attacker could then DoS your node with arbitrarily-long and expensive-to-validate blocks as you're booting up, thus preventing you from validating your chainstate. By requiring that all tenures downloaded be authenticated against Bitcoin state, we guarantee that forging alternative tenure histories in the past is at least as hard as rewriting Bitcoin history.

obycode commented 2 months ago

This makes total sense to me. I agree.

jcnelson commented 2 months ago

Got verbal confirmation from @kantai that this isn't a bug

blockstack-devops commented 3 weeks ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.