Multi-miner , Signer block rejection

stacks-network / stacks-core

The Stacks blockchain implementation

https://docs.stacks.co

GNU General Public License v3.0

3.01k stars 671 forks source link

Multi-miner , Signer block rejection #5132

Closed saralab closed 2 months ago

saralab commented 2 months ago

Miner 2 proposes block #156

Signer 2 accepts: This wasn’t a confirmed block, needs to see lack of confirmation
Signer 1 accepts
Signer 0 rejects, because it saw burn block 134 before seeing the proposal
3/8 signing power rejects so the block is not accepted

All miners see burn block #134

Miner 2 ends its tenure
Miner 0 starts its tenure

Miner 0 (correctly) builds from block 155 and proposes block #156 to the signers

Signer 2 rejects the block (because it is reorging the previous block #156 that it accepted)
Signer 0 accepts the block
Signer 1 rejects the block (same reason)
5/8 reject the block so it is not accepted

This repeats forever

Discussion:

Timer needs reset, legit bug
Fix the Signer to see the lack of confirmation
In this case they shouldn’t accept the last block (#156) as canonical until signature threshold is reached
Subscribe to the mined blocks , to update their internal state.
Signer could poll the node too
Will not necessarily add latency as a result , this is needed as a fallback for Signers
Integration test to split the Signer set to have different visibility of blocks
Safety check: If a block is seen as rejected, it should be always rejected

Miner doesn’t properly track whether a block rejection already came from a signer

In begin_sign_v0, gathered_signatures isn’t updated with rejections
Not the root cause, but needs improvement

saralab commented 2 months ago

@jferrant Will draft a proposal and review with the team.

jcnelson commented 2 months ago

I think we flubbed the logic where signers learn whether or not a block proposal was accepted or rejected by the network. The acceptance process need to look like this I think:

Signers process and submit accept/reject signatures on a block proposal, but make no decision then on whether or not they treat the block as accepted. A block is instead "tentatively" (or "locally") accepted or rejected, until it becomes "definitively" (or "globally") accepted or rejected. A signer may treat multiple blocks at the same height as tentatively accepted or rejected. I think this is the problem -- deciding whether or not a block is accepted/rejected requires the signer to first observe that a global decision has been made. Specifically:
- Signers will treat a block as definitively rejected if they see a threshold of rejection signatures (this already happens)
- Signers will treat a block as definitively accepted if one of the following criteria are met:
  - A threshold of "accept" signatures are seen, in which case, the signer will broadcast the block to its node (this already happens)
  - It sees that its node has received, processed and accepted the block (in which case, no broadcast needs to happen). This doesn't happen right now, and is necessary to ensure that a signer's tentative rejection of a block does not prohibit it from signing blocks that descend from it (i.e. a signer may reject a block, but if the network accepts it, then the signer must treat it as definitively accepted).
- Signers make no decision on a block's status otherwise -- the block remains in a tentative state until the start of the next tenure, in which case, each tentative block is then treated as definitively rejected. This also doesn't happen right now, because a signer's decisions on block acceptance and rejection are not currently preempted by the network's decision to accept or reject blocks (and they need to be).
A miner treats the block as definitively accepted/rejected only if one of the following global criteria are true (this already happens):
- It receives enough signatures from signers to accept (in which case, it broadcasts the block)
- It sees that the block has been processed and accepted in its chainstate, meaning that it received a signer-broadcasted copy of the block (in which case, it does not broadcast the block)
- The miner treats the block as definitively rejected only if it receives a threshold of "reject" votes.

As before, a miner will continuously try to build atop one of the highest definitively accepted block, and continue to do so in the face of timeouts and rejections. The miner's p2p and relayer threads work in the background to sync with the signers' nodes to ensure that the miner has the same blocks (and same highest definitively accepted block) as the signers.

jcnelson commented 2 months ago

A block is instead "tentatively" (or "locally") accepted or rejected, until it becomes "definitively" (or "globally") accepted or rejected. A signer may treat multiple blocks at the same height as tentatively accepted or rejected.

This is something that I think needs some more elaboration. In the example above, signer 0 tentatively rejects the miner's first block because it saw burn block 134 before seing the proposal. Signers 1 and 2 tentatively accept, but because they don't receive a threshold of signatures (nevermind signer 0's rejection), their decisions aren't definitive.

Once all miners see burn block 134, the miner retries building the block. Signers 1 and 2 would now also tentatively accept the block. Once signer 0's acceptance is received, all signers definitively accept.

Here's an interesting question -- can we ensure that at most one block will be definitively accepted at a given block height? Because, what I've written above is insufficient -- it's possible that multiple tentatively accepted blocks at the same height can become definitively accepted.

We can change this to an "at most one" criteria if we're willing to add an extra round of communication:

Round 1: what we do right now, but no definitive accept/reject decision gets made. In other words, signers treat every block as tentative accepted or rejected even if they've meet the required signature thresholds. Signers do not process or accept blocks into their chainstates.
Round 2 (new): The miner chooses which tentatively-accepted block at height N will become definitively accepted. Only once this additional miner signature gets generated will a block be treated as definitively accepted. On receipt of this message (either via stackerdb or through a signed p2p block), a signer will treat the block as definitively accepted and simply delete all state for all other tentative blocks at that height.

In essence, this is a flavor of two-phase commit. We'd need the miner to instruct signers to commit to their validated blocks.

How might we achieve this? I think the answer is straightforward -- we just put the miner's round-2 signature into the block header.

EDIT: This assumes that the miner itself won't equivocate. I'll address that below.

jcnelson commented 2 months ago

What happens if the miner equivocates and signs and broadcasts two or more blocks at the same height in round 2? I think what must happen here at a minimum is signers that witness the equivocation will cease signing blocks from that miner for the rest of its tenure. Then, the next miner picks one of them and builds atop it.

This is straightforward to implement -- because the node will gladly process two Nakamoto blocks at the same height, it's easy to check to see if they were signed by the same miner. The node itself would track whether or not a tenure has two or more blocks signed at the same height, and report it to the signer.

If we're feeling adventurous, we could also slash the offending miner's coinbase. But, the above can ship after Nakamoto, since it's ultimately a signer policy choice to refuse to sign.

EDIT: If we're feeling less adventurous, but still want to make the miner suffer for its equivocation after Nakamoto ships, we can have the signer refuse to sign blocks originating from tenure block-commits coming from that equivocating miner, which forces the miner to re-register a VRF key and re-submit multiple block-commits before they can mine again.

jferrant commented 2 months ago

What happens if the miner equivocates and signs and broadcasts two or more blocks at the same height in round 2? I think what must happen here at a minimum is signers that witness the equivocation will cease signing blocks from that miner for the rest of its tenure. Then, the next miner picks one of them and builds atop it.

This is straightforward to implement -- because the node will gladly process two Nakamoto blocks at the same height, it's easy to check to see if they were signed by the same miner. The node itself would track whether or not a tenure has two or more blocks signed at the same height, and report it to the signer.

One question I have is, should the node even process two Nakamoto blocks at the same height? Is it better for the chainstate to reject it outright and not even process it or is there a valid reason why we would ever want to process two blocks at the same height?

jcnelson commented 2 months ago

One question I have is, should the node even process two Nakamoto blocks at the same height? Is it better for the chainstate to reject it outright and not even process it or is there a valid reason why we would ever want to process two blocks at the same height?

The Nakamoto chainstate DB must tolerate Nakamoto blocks at the same height because they can arise from Bitcoin forks.

jferrant commented 2 months ago

One question I have is, should the node even process two Nakamoto blocks at the same height? Is it better for the chainstate to reject it outright and not even process it or is there a valid reason why we would ever want to process two blocks at the same height?

The Nakamoto chainstate DB must tolerate Nakamoto blocks at the same height because they can arise from Bitcoin forks.

Ah of course. So additional logic would be required on the nodes side to ensure a miner was not punished for simply responding to a bitcoin fork. i.e. the burnchain consensus hash for the proposed block would have to be identical between the two proposed blocks signed by the same miner, yeah? EDIT: but this does make me wonder...in the case of a bitcoin fork...how would signers handle this? i.e. what if they signed a block built on a bitcoin fork. The subsequent stacks block...what should it look like and how should te singers handle it? I am sure we have some handling in place, but I don't think I have ever actually thought it through and am wondering how/if these changes would affect it.

obycode commented 2 months ago

Couldn't round 2 just be the miner mining the next block? It's already implicitly selecting the canonical block by mining the next block and proposing it, isn't it?

jcnelson commented 2 months ago

Yes, that's correct -- round 2 is a logically distinct round in the protocol, but in practice it can be (and is) piggybacked onto the next block's round 1 by way of building a block that acknowledges it as its parent.

Part of the point I'm trying to make is that we currently do not process (logical) round 2 correctly. Signers do not treat a miner's proposal for block N+1 as a "commit block N" message; instead, they eagerly and unconditionally commit blocks for which they observe a threshold of signatures and for which they have not yet witnessed a conflicting block N within a locally-determined timeout. Logically speaking, we need to make it so signers wait to accept block N until after they see a valid proposal for block N+1, since the proposal provides a signer-verifiable proof that the miner has acknowledged at least 70% of the signing power (i.e. a miner-signed header with a parent_block_id derived from a valid signature set). In our implementation, this would be achieved by a signer fork-choice rule -- even if signers' nodes eagerly and unconditionally process multiple blocks at height N regardless of whether or not they conflict, the signer would not treat a block at height N as part of any fork until it witnesses a valid proposal for block N+1.

jcnelson commented 2 months ago

Also, per a separate conversation with @jferrant, it's worth mentioning that the "at most one block at height N" rule only applies within a single Bitcoin fork. This is what SIP-021 calls for.

While one day it could be possible to have at-most-one block semantics globally, that would require dealing with the case where a tenure-change happens to land in a Bitcoin block that gets orphaned (which SIP-021 does not require us to do).

obycode commented 2 months ago

There is a case that we recently fixed in which the behavior would need to be changed:

Miner proposes block N
Signers sign block N, reaching the acceptance threshold
A communication problem causes miner to time out waiting for signatures
Miner proposes block N'

The current solution to this problem was that the signers can broadcast block N, as soon as they see that it has reached the acceptance threshold. The signers then reject the proposed N'. The miner eventually receives the signed block N via the network and then proposes block N+1.

With this proposal to solve this new problem, the signers would no longer broadcast block N but would instead accept block N'.

If this situation happens at the tenure boundary, then the next miner would have the option to build from N or N'.

jcnelson commented 2 months ago

With this proposal to solve this new problem, the signers would no longer broadcast block N but would instead accept block N'.

Signers should continue to store broadcast both N and N' if they have reached the signature threshold. However, singers do not believe that either N or N' are the chain tip until they see a valid proposal for N+1. That is, N and N' are "unconfirmed blocks." The proposal for N+1 confirms either N or N', and the other blocks that were not confirmed at height N will be treated as unconfirmed forever. If a miner submits two conflicting proposals for N+1 -- one that confirms N and one that confirms N', then signers that observe both proposals declare that the miner is malicious and refuse to sign any more blocks from it.

For example, here is a valid chain history under these rules. B[i] is a block, and i is the order in which it was produced.

N    B[0]
      |
      |-----.
      V     V
N+1   B[1]  B[2]
      |
      |
      V
N+2   B[3]
      |
      |-----.-----.-----.
      V     V     V     V
N+3   B[4]  B[5]  B[6]  B[7]
                  |
                  |
                  V
                 B[8]

The canonical chain is B[8] - B[6] - B[3] - B[1] - B[0]. The miner and signers are allowed to create sibling blocks, but once a sibling at height N is confirmed by a valid proposal, then no other blocks at that height can be built upon.

By contrast, here is an invalid history:

N     B[0]
       |
       |-----.
       V     V
N+1   B[1]  B[2]
       |     |
       |     X (invalid -- B[1] is confirmed)
       V     |
N+2   B[3]  B[5] (never processed)
       |
       |
       V
N+3   B[4]

Once the miner submits the proposal for B[5], the signers not only reject it, but also refuse to sign anything else the miner submits.

As before, the miner can produce as many blocks at height N as it needs to in order to build a block that has 70% signing power. But once the miner moves on, they cannot go back.

We can get to a place where we get at most one block produced at height N, but for now, it would suffice that we have at most one block accepted at height N. Most of the time, there won't be siblings.

jcnelson commented 2 months ago

Ah of course. So additional logic would be required on the nodes side to ensure a miner was not punished for simply responding to a bitcoin fork. i.e. the burnchain consensus hash for the proposed block would have to be identical between the two proposed blocks signed by the same miner, yeah? EDIT: but this does make me wonder...in the case of a bitcoin fork...how would signers handle this? i.e. what if they signed a block built on a bitcoin fork. The subsequent stacks block...what should it look like and how should te singers handle it? I am sure we have some handling in place, but I don't think I have ever actually thought it through and am wondering how/if these changes would affect it.

I don't think the node needs to be involved in punishment at all, unless we intend to slash their coinbase (I don't think this is necessary for Nakamoto; forcing the miner to rotate their Bitcoin keys is usually harsher). I think this is a decision that each signer makes locally based on whether or not they observed the miner equivocate. In my diagram above, the signers who see the proposal for B[5] after B[3] has been accepted would decide to punish the miner by refusing to sign any more blocks from it.

The node already tracks each Stacks fork atop each Bitcoin fork, so the signer can detect miner equivocation for block N+1 simply by asking the node for the list of processed block headers at height N+1. If they all have the same parent, then there's no equivocation. Otherwise, there is equivocation, and block N+1 should be rejected and the miner punished.

I think there needs to be an API endpoint for the above in the Stacks node, but I think that most of the work to make this all happen is changing the signer behavior.

kantai commented 2 months ago

I think that there's two separate issues here:

The tolerances and timings for rejecting submissions should be updated to minimize the likelihood of this occurring -- this is ultimately a scenario that the network should try to avoid.
As Jude discusses here, the signer logic for how it treats locally signed blocks needs to be updated with a "tentatively accepted" state and a "rejected state" (possibly a "tentatively rejected" state as well, but I think that's probably unnecessary).

(1) is theoretically easier to solve, but solving it doesn't mean (2) doesn't need to be solved.

Anyways, I think the strategy for 2 could be somewhat straight-forward.

Basically, the signer db tracks proposals in one of four states:

Proposed -- the proposal was received from the miner, has passed the initial set of checks and is waiting for a response from the stacks-node proposal evaluation endpoint. I think this is basically unchanged from the current implementation.

Rejected -- if the stacks-node has locally rejected the proposal, or (set-size) - (threshold) + 1 signers have rejected the proposal.

Tentatively accepted -- the stacks-node has locally accepted the proposal, and broadcasted a signature

Globally Accepted -- (threshold) signers have accepted the proposal.

Tentatively accepted transitions to Globally Accepted or Rejected if and only if the signer receives enough proposal responses from other signers to perform the transition (I think this is slightly different than Jude's proposal above, which transitions to rejected when the tenure changes: I'll discuss why in a moment).

These states are really important to the signer when it is evaluating subsequent proposals. I think the rules should be something like:

If a block proposal is in the same tenure as a prior proposal, its height must be greater than the highest tentatively accepted block known to the signer. Until the tentatively accepted block is rejected by the signer set, the signer will not accept a sibling in the same tenure.
If a block proposal is in a new tenure, its height must be greater than the highest globally accepted block.

Otherwise, I don't think the signer needs more complex logic. This guarantees the signer set never approves a sibling in the same tenure: a sibling would only ever be approved once a prior proposal is actually rejected (and an honest signer only ever responds ACCEPT or REJECT once for a proposal). It does mean that a given tenure could "stall" if there's not agreement in the signer set, but I think this is what should happen anyways. Siblings could occur across tenures, but that was already the case.

jcnelson commented 2 months ago

On the miner / node side of things, the following would need to change:

The miner never times out its block proposal. It waits until one of the following happen:
- A tenure change happens, in which case the miner stops
- The miner sees enough signers reject the block, in which case the miner tries mining a new block at the same height
- The miner sees enough signers accept the block (either through gathering signatures or through the node processing the block), in which case the miner tries to mine a new block at the next height
The /v3/block_proposal endpoint will verify that the proposed block's tenure is canonical on the Bitcoin chain. If not, then the block is invalid.
The /v3/block_proposal endpoint will verify that a proposed block builds atop the highest block within its tenure. It will not verify that the tenure itself is canonical

jferrant commented 2 months ago

Tentatively accepted transitions to Globally Accepted or Rejected if and only if the signer receives enough proposal responses from other signers to perform the transition (I think this is slightly different than Jude's proposal above, which transitions to rejected when the tenure changes: I'll discuss why in a moment).

Just to confirm, it is also possible for Rejected to transition to globally accepted? I assume a stacks node could have an outdated view and reject a block whereas all other signers approve it, yeah? If this is the case, I would introduce a tenativerejected to distinguish between the threshold signature rejection and the node marking it invalid.

kantai commented 2 months ago

Tentatively accepted transitions to Globally Accepted or Rejected if and only if the signer receives enough proposal responses from other signers to perform the transition (I think this is slightly different than Jude's proposal above, which transitions to rejected when the tenure changes: I'll discuss why in a moment).

Just to confirm, it is also possible for Rejected to transition to globally accepted? I assume a stacks node could have an outdated view and reject a block whereas all other signers approve it, yeah? If this is the case, I would introduce a tenativerejected to distinguish between the threshold signature rejection and the node marking it invalid.

This might help with debugability and its probably safer to do this to future proof the signer's logic, but I don't think this its strictly necessary. Because the checks that the signer is performing are all based on block height (and it performs "greater than" checks), the signer will just move on if the rest of the signer set ends up accepting the proposal.

jferrant commented 2 months ago

This might help with debugability and its probably safer to do this to future proof the signer's logic, but I don't think this its strictly necessary. Because the checks that the signer is performing are all based on block height (and it performs "greater than" checks), the signer will just move on if the rest of the signer set ends up accepting the proposal.

Ah this is true...see I was thinking that once a block proposal is marked as GloballyRejected it should NEVER transition to GloballyAccepted as this would indicate some sort of bug or some malicious behaviour as signers should never respond with different answers to a repeat block (however a LocallyRejected block could very much transition to a GloballyAccepted block). However, this could cause a stall so perhaps better to just allow this to potentially happen?

jferrant commented 2 months ago

TLDR for @saralab: Signers must continue to process block proposals and submit their acceptance and rejection signatures accordingly. However, the signer must be updated to recognize the difference between their local versus the global view of the network. They may only mark a block definitely accepted or rejected when they observe a global decision has been made, specifically that the threshold number of rejections or signatures have been reached. To prevent forks within a tenure, the signer set will never approve a sibling block within the same tenure by ensuring the block proposal builds atop the highest accepted block: a sibling would only ever be approved once a prior proposal is actually rejected. It does mean that a given tenure could "stall" if there is no agreement in the signer set and a miner’s tenure effectively ends as it can never propose a valid block. However, at the tenure boundary, the signer can utilize the last globally accepted block of the parent tenure to determine whether the proposed block is valid, preventing the stall from propagating into the next tenure. Therefore, siblings could occur across tenures, but this is expected and acceptable behaviour.

jferrant commented 2 months ago

On the signer side of things (Stolen from @kantai Primarily :P )

The signer would add the following block states to SignerDB:

Proposed– the proposal was received from the miner, has passed the initial set of checks and is waiting for a response from the stacks-node proposal evaluation endpoint. LocallyAccepted – the stacks-node has locally accepted the proposal, and broadcasted a signature, but does not yet have a (threshold) number of signatures confirming the block GloballyAccepted – (threshold) signers have accepted the proposal. LocallyRejected – if the stacks-node has locally rejected the proposal/signer has failed initial set of checks GloballyRejected – (set-size) - (threshold) + 1 signers have rejected the proposal.

TentativelyAccepted and TentativelyRejected can both transition to GloballyAccepted or GloballyRejected if and only if the signer receives enough proposal responses from other signers to perform the transition. Once a block is marked as GloballyAccepted or GloballyRejected, no further transitions may occur.

Prior to querying the block validation endpoint a signer will evaluate a block with the following rules:

If a block proposal is in the same tenure as a prior proposal, its height must be greater than the highest block *Accepted block known to the signer in that tenure.
If a block proposal is in a new tenure, it’s height must be greater than the highest GloballyAccepted block in its parent's tenure.

NOTE: This change relies on the block proposal endpoint changes @jcnelson is handling.

kantai commented 2 months ago

My only comment on the above is that this:

If a block proposal is in the same tenure as a prior proposal, its height must be greater than the highest block Accepted block known to the signer. If a block proposal is in a new tenure, it’s height must be greater than the highest GloballyAccepted block.

Should be:

If a block proposal is in the same tenure as a prior proposal, its height must be greater than the highest Accepted block known to the signer in that tenure.
If a block proposal is in a new tenure, it’s height must be greater than the highest GloballyAccepted block in its parent tenure.

blockstack-devops commented 3 weeks ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.