pokt-network / pocket-core

Official implementation of the Pocket Network Protocol
http://www.pokt.network
MIT License
208 stars 101 forks source link

Can't Sync from 0 on latest release [MAINNET] #1564

Closed LordKurono closed 1 year ago

LordKurono commented 1 year ago

Describe the bug Starting a new pocket node on version RC-0.10.0 using this command: pocket start --seeds="7c0d7ec36db6594c1ffaa99724e1f8300bbd52d0@seed1.mainnet.pokt.network:26662,cdcf936d70726dd724e0e6a8353d8e5ba5abdd20@seed2.mainnet.pokt.network:26663,74b4322a91c4a7f3e774648d0730c1e610494691@seed3.mainnet.pokt.network:26662,b3235089ff302c9615ba661e13e601d9d6265b15@seed4.mainnet.pokt.network:26663" --mainnet

Crashes after processing the first block

I[2023-07-04|12:14:16.187] Received                                     module=blockchain src="Peer{MConn{50.171.22.146:26656} 5595f94dc0f9e205544abaf03b6488dfab0d8f79 out}" height=19
I[2023-07-04|12:14:16.187] Received                                     module=blockchain src="Peer{MConn{50.171.22.146:26656} 5595f94dc0f9e205544abaf03b6488dfab0d8f79 out}" height=38
I[2023-07-04|12:14:16.198] Committed state                              module=state height=1 txs=0 appHash=44AE48F34E908F8E387DA34DEFBD099FDC4B956BA8F0CDF28FF719E4DB35481B
I[2023-07-04|12:14:16.198] Indexed block                                module=state height=1
I[2023-07-04|12:14:16.203] Indexed block                                module=txindex height=1
I[2023-07-04|12:14:16.205] makeNextRequests will make following requests module=blockchain number=1 heights=[65]
I[2023-07-04|12:14:16.205] assigned request to peer                     module=blockchain peer=b82236fda3ceaffc52c82328ac80c6f13601f444 height=65
panic: failed to process committed block (2:0379FFD028E6A0D08B2C082B85B3E98C504B1EBDBC7DDAA431391486F3D77CB7): wrong Block.Header.AppHash.  Expected 44AE48F34E908F8E387DA34DEFBD099FDC4B956BA8F0CDF28FF719E4DB35481B, got 6C81DAD888A261A1CD41C1AA7A639F22D0A2A31668E002910403FB74CD573CBB

goroutine 69 [running]:
github.com/tendermint/tendermint/blockchain/v1.(*BlockchainReactor).processBlock(0x1400189a000)
        /Users/kurono/go/pkg/mod/github.com/pokt-network/tendermint@v0.32.11-0.20230405220629-96c095f0058d/blockchain/v1/reactor.go:476 +0x4ac
github.com/tendermint/tendermint/blockchain/v1.(*BlockchainReactor).processBlocksRoutine(0x1400189a000, 0x14001c82060)
        /Users/kurono/go/pkg/mod/github.com/pokt-network/tendermint@v0.32.11-0.20230405220629-96c095f0058d/blockchain/v1/reactor.go:335 +0x16c
created by github.com/tendermint/tendermint/blockchain/v1.(*BlockchainReactor).poolRoutine
        /Users/kurono/go/pkg/mod/github.com/pokt-network/tendermint@v0.32.11-0.20230405220629-96c095f0058d/blockchain/v1/reactor.go:374 +0xdc

Using version RC-0.9.2 yields the expected behaviors and the node syncs.

Expected behavior Node syncs from block 1 to TOP of the chain

nodiesBlade commented 1 year ago

I noticed the same issue when scratching from sync. I thought it was just a POKT issue the entire time, but now that you mention it, I was also using the latest staging branch. Interesting that RC v0.9.2 is working just fine for you, definitely worthy sanity-checking before we upgrade to V0.10.0.

When syncing from scratch, I also received wrong Block.Header.AppHash.. I also tried a snapshot around block ~60,000 and also received wrong Block.Header.AppHash. as an error, but this one might've been caused due to some other unrelated testing.

FYI: We are not on RC 0.10.0 yet on mainnet. But i'm not sure why that would make the syncing fail. Best case - this is not an issue, worst case - it was flagged before the network was upgraded. @LordKurono thanks for the report

Olshansk commented 1 year ago

@PoktBlade you likely have more experience synching nodes from scratch so would appreciate your help here:

  1. Does this happen every time or ephemerally?

  2. Have you hit it in the past?

  3. What do you recommend we sanity check?

  4. Which snapshot are you using?

  5. Would you be willing to take on this investigation?

nodiesBlade commented 1 year ago

@PoktBlade you likely have more experience synching nodes from scratch so would appreciate your help here:

  1. Does this happen every time or ephemerally?
  2. Have you hit it in the past?
  3. What do you recommend we sanity check?
  4. Which snapshot are you using?
  5. Would you be willing to take on this investigation?

I actually don't have too much experience syncing from scratch, however, I do know it's pretty important that syncing from a point of time still works.

  1. When syncing from scratch, pretty much every time.
  2. No, I haven't actually synced from scratch in a very long long time. I did hear reports this was an issue before in the past, but the details are really fuzzy.
  3. I would start with a couple of things:
      1. Try to sync from not so far snapshot (i.e. block 85xxx) and see if it crashes, or if this is specific to scratch only or really historical snapshots only.
      1. The only thing that really changed with the state machine explicitly in V0.10.0 was this https://github.com/pokt-network/pocket-core/pull/1534/files. So we can start there while debugging the app state of the validator as well. The other thing I think we should look into is the upgraded app version if that has any significance when syncing.
  4. https://link.us1.storjshare.io/raw/jwfbmq6ar3vsyzeqkconsiz24sja/pocket-public-blockchains/pocket-network-data-0026-rc-0.6.3.6.tar was the snapshot I was trying. I was trying an old snapshot for some other testing purposes unrelated to this issue.
  5. I can help triage the issue a bit more for you and report back here. I.E - I can try to revert the above PR and see if the issue still persists. But preferably if it is due to a recent change in v0.10.0, let's get the right code owners to apply a fix once triaged.
Olshansk commented 1 year ago

When syncing from scratch, pretty much every time.

🤔

No, I haven't actually synced from scratch in a very long long time. I did hear reports this was an issue before in the past, but the details are really fuzzy.

Q1: @msmania @okdas Have either of you seen this?

Have either of you seen this?

Try to sync from not so far snapshot (i.e. block 85xxx) and see if it crashes, or if this is specific to scratch only or really historical snapshots only.

R1: @PoktBlade Please give it a shot and report back! You can look at the snapshots @Andrew-Pohl's team has set up in #1565

The only thing that really changed with the state machine explicitly in V0.10.0 was this #1534 (files). So we can start there while debugging the app state of the validator as well. The other thing I think we should look into is the upgraded app version if that has any significance when syncing.

Looked at the code and I don't fully understand why it would impact it given that we haven't upgraded the protocol but trust your judgment. R2: Please report back once you've given it a shot!

link.us1.storjshare.io/raw/jwfbmq6ar3vsyzeqkconsiz24sja/pocket-public-blockchains/pocket-network-data-0026-rc-0.6.3.6.tar was the snapshot I was trying. I was trying an old snapshot for some other testing purposes unrelated to this issue.

Repeating comment above, take a look at #1565 to see if it helps.

I can help triage the issue a bit more for you and report back here. I.E - I can try to revert the above PR and see if the issue still persists. But preferably if it is due to a recent change in v0.10.0, let's get the right code owners to apply a fix once triaged.

R2: Can you revert (i.e. hard reset) to 410e12c3b40106e6e9c98edd4f17ed1234a7d435 and see if the same issue occurs?

R3: Can you rm 0aa12e71a33fca29c8e8ad5f60dcccae775a8045 and see if the same issue occurs?

Thanks again for your help in advance! Seeing PNF (@jacklaing) for visibility to know that community members are helping debug this.


Legend:

nodiesBlade commented 1 year ago

triaged and fixed in https://github.com/pokt-network/pocket-core/pull/1566

Olshansk commented 1 year ago

Resolved in @PoktBlade's fix!