stacks-network / stacks-core

The Stacks blockchain implementation
https://docs.stacks.co
GNU General Public License v3.0
3k stars 659 forks source link

[TESTNET 2.1 BUG] burn_block_height stops progressing when syncing from genesis #3454

Closed deantchi closed 1 year ago

deantchi commented 1 year ago

the separate 2.1 testnet env we're running at Hiro was recently upgraded to: stacks-node: next:138f0c9 stacks-node-api: 7.0.0-stacks-2.1.1

this upgrade required a resync from genesis and on 2 separate attempts, both burn_block_height stopped at 2408100 all stateful data were purged both times.

this was found in the logs


{"msg":"Anchor block b64543fef649f76695538d5dd23c189572cc45bcd7dfe1d756c84db460b5ee69 for reward cycle np is affirmed by the network (1), but must be downloaded","level":"INFO","ts":"2022-12-22T17:05:01.662156253+00:00","thread":"chains-coordinator-0.0.0.0:20443","line":1158,"file":"src/chainstate/coordinator/mod.rs"}
{"msg":"Burnchain block processing stops due to missing affirmed anchor block b64543fef649f76695538d5dd23c189572cc45bcd7dfe1d756c84db460b5ee69","level":"INFO","ts":"2022-12-22T17:05:01.662199604+00:00","thread":"chains-coordinator-0.0.0.0:20443","line":1407,"file":"src/chainstate/coordinator/mod.rs"}
{"msg":"Anchor block selected for cycle 2: b64543fef649f76695538d5dd23c189572cc45bcd7dfe1d756c84db460b5ee69","level":"INFO","ts":"2022-12-22T17:05:01.665322663+00:00","thread":"chains-coordinator-0.0.0.0:20443","line":592,"file":"src/chainstate/coordinator/mod.rs"}
{"msg":"Anchor block b64543fef649f76695538d5dd23c189572cc45bcd7dfe1d756c84db460b5ee69 for reward cycle np is affirmed by the network (1), but must be downloaded","level":"INFO","ts":"2022-12-22T17:05:01.665435943+00:00","thread":"chains-coordinator-0.0.0.0:20443","line":1158,"file":"src/chainstate/coordinator/mod.rs"}```
jcnelson commented 1 year ago

Can you add require_affirmed_anchor_blocks = false to the [node] subsection of the config, and try again?

EDIT: This defaults to true for followers, and false for miners.

deantchi commented 1 year ago

thanks Jude. I will do that, but I want to make sure that flag isn't intended to just workaround an underlining problem.

jcnelson commented 1 year ago

In that case, can you try a node built from the latest in #3446 as well?

deantchi commented 1 year ago

no dice. burn_block_height stopped at the exact same spot. Will try the PR as suggested

jcnelson commented 1 year ago

Can you tell me more about this?

the separate 2.1 testnet env we're running at Hiro was recently upgraded to:

I'm going to try and reproduce it in the mean time. What's your testnet start block and 2.1 transition height?

deantchi commented 1 year ago

this is the separate testnet env myself and @wileyj spun up about 2 months ago for 2.1 testing, while it has been running with no noticeable problems since. It was requested to upgrade our side of the api and follower to the version listed at the top of this ticket.

the values in question are just what's in the next branch, https://github.com/stacks-network/stacks-blockchain/blob/next/src/core/mod.rs#L111-L118

deantchi commented 1 year ago

running the latest feat-check-pox-2-params stops the burn_block_height at 2408101. this behavior was repeatable.


{"msg":"Anchor block b64543fef649f76695538d5dd23c189572cc45bcd7dfe1d756c84db460b5ee69 (txid 75a50b6443199b7dcd10d955fae7648346cbd72d2a5445180e3cbdd693ab7825) for reward cycle 1 is affirmed by the network (np), but must be downloaded","level":"INFO","ts":"2022-12-22T19:09:58.605393945+00:00","thread":"chains-coordinator-0.0.0.0:20443","line":1667,"file":"src/chainstate/coordinator/mod.rs"}
{"msg":"Burnchain block processing will continue in spite of missing affirmed anchor block b64543fef649f76695538d5dd23c189572cc45bcd7dfe1d756c84db460b5ee69","level":"INFO","ts":"2022-12-22T19:09:58.605439325+00:00","thread":"chains-coordinator-0.0.0.0:20443","line":1815,"file":"src/chainstate/coordinator/mod.rs"}
{"msg":"Begin reward-cycle sortition with absent anchor block=Some((b64543fef649f76695538d5dd23c189572cc45bcd7dfe1d756c84db460b5ee69, 75a50b6443199b7dcd10d955fae7648346cbd72d2a5445180e3cbdd693ab7825))","level":"INFO","ts":"2022-12-22T19:09:58.606705145+00:00","thread":"chains-coordinator-0.0.0.0:20443","line":3447,"file":"src/chainstate/burn/db/sortdb.rs"}
{"msg":"Handle HTTPRequest","level":"INFO","ts":"2022-12-22T19:09:58.608070535+00:00","thread":"p2p-(0.0.0.0:20444,0.0.0.0:20443)","line":1637,"file":"src/net/http.rs","query":"","path":"/v2/info","peer_addr":"127.0.0.1:41224","verb":"GET"}
{"msg":"Begin reward-cycle sortition with absent anchor block=Some((b64543fef649f76695538d5dd23c189572cc45bcd7dfe1d756c84db460b5ee69, 75a50b6443199b7dcd10d955fae7648346cbd72d2a5445180e3cbdd693ab7825))","level":"INFO","ts":"2022-12-22T19:09:58.609311925+00:00","thread":"chains-coordinator-0.0.0.0:20443","line":3447,"file":"src/chainstate/burn/db/sortdb.rs"}
{"msg":"Begin reward-cycle sortition with absent anchor block=Some((b64543fef649f76695538d5dd23c189572cc45bcd7dfe1d756c84db460b5ee69, 75a50b6443199b7dcd10d955fae7648346cbd72d2a5445180e3cbdd693ab7825))","level":"INFO","ts":"2022-12-22T19:09:58.609379615+00:00","thread":"chains-coordinator-0.0.0.0:20443","line":3447,"file":"src/chainstate/burn/db/sortdb.rs"}
{"msg":"ACCEPTED(2408101) leader block commit 0a7659a84bf00f1a4360d7ff9e1a6aa17fb0d58b5197bf8798077cf9be872910 at 2408101,73","level":"INFO","ts":"2022-12-22T19:09:58.615477084+00:00","thread":"chains-coordinator-0.0.0.0:20443","line":4607,"file":"src/chainstate/burn/db/sortdb.rs","apparent_sender":"mtEoVpBV5H8bbmNDEPwaoJHXnF1MxbkkQf"}
{"msg":"Processing newly received Stacks blocks: 3, microblocks: 0, confirmed microblocks: 0","level":"INFO","ts":"2022-12-22T19:09:58.647222202+00:00","thread":"relayer-http://0.0.0.0:20443","line":1429,"file":"src/net/relay.rs"}```
jcnelson commented 1 year ago

@deantchi Can you post your config file? Still trying to reproduce it.

deantchi commented 1 year ago

[node]
    rpc_bind = "0.0.0.0:20443"
    p2p_bind = "0.0.0.0:20444"
    bootstrap_node = "027e929237cc4c12980b109bd3180ca591f3b72896b6e3716f5d1dce0a5bdd2dc4@next.stacks.org:20444,020ea9dc83d07924e3ffd8097f2c24196f0b08aca4daf371d760b686f6f2709903@2-1-seed.testnet.dev.hiro.so:20443,03dbbee812d0ef161d55acdec9f023751bb3224e66e548af0b7184b6ac0ce68b7e@2-1-seed.testnet.stg.hiro.so:20443,02ff5e37c488b517a61976b2392b6f5e93f303c4c5393a73e83eba03b3e4d12d00@2-1-seed.testnet.prd.hiro.so:20443"
    local_peer_seed = "43a0a2e4ab70b3178cb6b7a46cb243d5d2262f230dbfed21323dc2f7edf0feef"
    prometheus_bind = "0.0.0.0:9153"
    working_dir = "/hirosystems/data"
    require_affirmed_anchor_blocks = false

    [burnchain]
    chain = "bitcoin"
    mode = "xenon"
    peer_host = "testnet-bitcoin.testnet-bitcoin.svc.cluster.local"
    username = "hirosystems"
    password = "hirosystems"
    rpc_port = 18332
    peer_port = 18333
    pox_2_activation = 2406590
    magic_bytes = "21"

    [[burnchain.epochs]]
    epoch_name = "1.0"
    start_height = 0

    [[burnchain.epochs]]
    epoch_name = "2.0"
    start_height = 2000000

    [[burnchain.epochs]]
    epoch_name = "2.05"
    start_height = 2104380

    [[burnchain.epochs]]
    epoch_name = "2.1"
    start_height = 2406000

    [[events_observer]]
    endpoint = "testnet-2-1-api.testnet-api.svc.cluster.local:3700"
    retry_count = 255
    events_keys = ["*"]

    [[ustx_balance]]
    address = "ST2QKZ4FKHAH1NQKYKYAYZPY440FEPK7GZ1R5HBP2"
    amount = 10000000000000000

    [[ustx_balance]]
    address = "ST319CF5WV77KYR1H3GT0GZ7B8Q4AQPY42ETP1VPF"
    amount = 10000000000000000

    [[ustx_balance]]
    address = "ST221Z6TDTC5E0BYR2V624Q2ST6R0Q71T78WTAX6H"
    amount = 10000000000000000

    [[ustx_balance]]
    address = "ST2TFVBMRPS5SSNP98DQKQ5JNB2B6NZM91C4K3P7B"
    amount = 10000000000000000

    [fee_estimation]
    fee_estimator = "fuzzed_weighted_median_fee_rate"
wileyj commented 1 year ago

I'm running a test as well without require_affirmed_anchor_blocks = false to see if i can reproduce it in a similar (docker) environment

wileyj commented 1 year ago

first test reproduces what @deantchi noticed, and this is logged (no debug for this test, that's coming):

INFO [1671814492.009486] [src/chainstate/coordinator/mod.rs:988] [chains-coordinator-0.0.0.0:20443] Sortition anchor block affirmation map `na` and/or Stacks affirmation map `` is no longer compatible with heaviest affirmation map n in reward cycles 1-2

re-running with debug enabled, followed by require_affirmed_anchor_blocks = false with debug

jcnelson commented 1 year ago

I think there's a misconfiguration here. I can't seem to find the consensus hash of the first block that next.stacks.org knows about.

$ curl -s http://next.stacks.org:20443/v2/headers/2100 | jq -r  '.[].parent_block_id' | tail -n 1
42a31d7ae0b446460d8d2e4493a9fae226b758000ec0589bd3e8f04463d93ee1
$ curl http://next.stacks.org:20443/v2/headers/2100?tip=42a31d7ae0b446460d8d2e4493a9fae226b758000ec0589bd3e8f04463d93ee1 | jq -r  '.[].parent_block_id' | tail -n 1
42fb19d381c8ca04d3f3be20196b3dc2dc92385f01df719f2104f73260d15caf
$ curl -s http://next.stacks.org:20443/v2/headers/2100?tip=42fb19d381c8ca04d3f3be20196b3dc2dc92385f01df719f2104f73260d15caf | jq -r '.[-1]'
{
  "consensus_hash": "2f811896bb24e3c1ada96947a00806ba4e0b158f",
  "header": "02000000000000000000000000000000018fea918ceefc499b1b990e4f8df01238332abcaabf4a2af465a98479d65acf806c6c74d82f6e829ef32b8db662be8e922e97d860e4ab2f5c5c63fdfccc904936b10df77a0b5d04dd2bc909eb086c4f060000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000008a5671544a3082eac12c2e64cc54af5246c0f2bd0b656adda882d09b56ff1946641d9edf3d7a99de77da6dbd403125bfb9279463b0d8f88dbe4210143270ffcdb8795302ffddacbf5160f57cd1f6472ffd36cf9d",
  "parent_block_id": "55c9861be5cff984a20ce6d99d4aa65941412889bdc665094136429b84f8c2ee"
}
$ # this is the header of the first-mined Stacks block that next.stacks.org knows about. let's get its sortition
$ sqlite3 /tmp/stacks-testnet/xenon/burnchain/sortition/marf.sqlite 'select * from snapshots where consensus_hash = "2f811896bb24e3c1ada96947a00806ba4e0b158f"' | wc -l
0
$
jcnelson commented 1 year ago

Ah, I see -- my pox_2_activation is different. Will try again.

jcnelson commented 1 year ago

Just for my own edification, so I can know what to expect -- what's the output of this command?

$ sqlite3 /path/to/your/xenon/burnchain/sortition/marf.sqlite 'select block_height from snapshots where consensus_hash = "2f811896bb24e3c1ada96947a00806ba4e0b158f"'
jcnelson commented 1 year ago

Something is still not right. Your node -- next.stacks.org -- is reporting completely empty block inventories to my node.

jcnelson commented 1 year ago

Got the height -- it's 2407636.

jcnelson commented 1 year ago

Okay, this is bizarre. The node next.stacks.org seems to have gone for multiple reward cycles prior to mining its first block. Moreover, its consensus hash history diverges from my bootstrapping node at 2406035 (35 blocks after bootup!).

wileyj commented 1 year ago

sqlite3 /path/to/your/xenon/burnchain/sortition/marf.sqlite 'select block_height from snapshots where consensus_hash = "2f811896bb24e3c1ada96947a00806ba4e0b158f"'

confirmed:

$ sqlite3 /stacks-blockchain/data/xenon/burnchain/sortition/marf.sqlite 'select block_height from snapshots where consensus_hash = "2f811896bb24e3c1ada96947a00806ba4e0b158f"'
2407636
jcnelson commented 1 year ago

Found the root cause -- the node on next.stacks.org doesn't have the same consensus rules as next any longer. A node built off of feat/check-pox-2-params today is unable to process the first block it receives. I'll try and see if I can find the root cause, but because this network is just a pre-release 2.1 network, it might make more sense to just nuke it and start a new one. We really need to focus our efforts on making sure feat/check-pox-2-params can boot off of the mainline testnet, as well as mainnet.

jcnelson commented 1 year ago

Looking deeper at the delta between feat/check-pox-2-params and the code on next.stacks.org, it looks irreconcilable. feat/check-pox-2-params has a different pox-2 contract, so it won't process the epoch 2.1 transition the same way.

wileyj commented 1 year ago

testnet has been reset, we've mined a block

jcnelson commented 1 year ago

Closing due to the new testnet bootup being the fix for the sync problem.