stacks-network / stacks-core

The Stacks blockchain implementation
https://docs.stacks.co
GNU General Public License v3.0
3k stars 660 forks source link

testnet node restart causes sync from scratch #4653

Open pseudozach opened 2 months ago

pseudozach commented 2 months ago

I have a testnet node that had stalled at height 153329 so I restarted it and now it's failing with below error and starts to sync from scratch.

There was nothing changed on the file system so it can't be any permission or DB issue.

cc @wileyj

...
stacks-blockchain      | INFO [1712543044.720830] [testnet/stacks-node/src/run_loop/boot_nakamoto.rs:205] [epoch-2/3-boot] Failed to open Sortition DB while checking current burn height, assuming height = 0
stacks-blockchain      | INFO [1712543044.983827] [stackslib/src/burnchains/bitcoin/spv.rs:1286] [main] Syncing Bitcoin headers: 57.6% (1490000 out of 2585590)
...

I definitely have data so not sure why it's somehow inaccessible, here's the folder contents as requested by jw

zach@lnswap-1:~/stacks-blockchain-docker/persistent-data/testnet/stacks-blockchain/xenon$ ls -l
total 814032
-rw-r--r-- 1 root zach   2203648 Mar 30 04:55 atlas.sqlite
drwxr-sr-x 3 root zach      4096 Apr  8 01:48 burnchain
drwxr-sr-x 5 root zach      4096 Apr  8 01:48 chainstate
-rw-r--r-- 1 root zach 829743104 Apr  4 05:03 headers.sqlite
-rw-r--r-- 1 root zach   1523712 Apr  8 01:48 headers.sqlite.reorg
-rw-r--r-- 1 root zach     57344 Apr  8 01:48 peer.sqlite
-rw-r--r-- 1 root zach     28672 Mar 25 20:05 stacker_db.sqlite
zach@lnswap-1:~/stacks-blockchain-docker/persistent-data/testnet/stacks-blockchain/xenon$ du -sh chainstate/
21G     chainstate/

update - it's same behavior after restoring chainstate from hiro archives and also same with both 2.5.0.0.0-rc1 and next image

# STACKS_BLOCKCHAIN_VERSION=2.5.0.0.0-rc1
STACKS_BLOCKCHAIN_VERSION=next
STACKS_BLOCKCHAIN_API_VERSION=7.10.0-beta.1
wileyj commented 2 months ago

ping @CharlieC3 @obycode @kantai

interesting issue that i haven't seen before

wileyj commented 2 months ago

this was from a node that was at chain tip, and this happened on a restart. @pseudozach can you copy/paste your current chainstate dir? (ls -alh is probably fine

wileyj commented 2 months ago

https://github.com/stacks-network/stacks-core/blob/next/testnet/stacks-node/src/run_loop/boot_nakamoto.rs#L205

https://github.com/stacks-network/stacks-core/blob/9b377f9c36357d6bdc7df4134c0bfd358c42c651/stackslib/src/net/chat.rs#L2751

pseudozach commented 2 months ago

this was from a node that was at chain tip, and this happened on a restart. @pseudozach can you copy/paste your current chainstate dir? (ls -alh is probably fine

sure

zach@lnswap-1:~/stacks-blockchain-docker/persistent-data/testnet/stacks-blockchain/xenon/chainstate$ ls -alh
total 9.7M
drwxr-sr-x     5 root zach  4.0K Apr  8 01:48 .
drwxr-sr-x     4 root zach  4.0K Apr  8 01:48 ..
drwxr-sr-x 59329 root zach 1020K Apr  8 01:48 blocks
drwxr-sr-x     2 root zach  4.0K Apr  8 01:48 estimates
-rw-r--r--     1 root zach  8.6M Apr  8 01:48 mempool.sqlite
-rw-r--r--     1 root zach   12K Mar 25 17:01 tx_tracking.sqlite
drwxr-sr-x     3 root zach  4.0K Apr  8 01:48 vm
kantai commented 2 months ago

Can you run ls -lah on the xenon/burnchain directory as well?

wileyj commented 2 months ago

this may have been a config issue with the working_dir key, but let's leave it open in case it comes up again. i've asked @pseudozach to collect logs in the case it stalls again. currently the node is working as expected.