stacks-network / stacks-blockchain-docker

Stacks-blockchain with API using docker compose
GNU General Public License v3.0
27 stars 37 forks source link

No progress during sync #88

Closed friedger closed 1 year ago

friedger commented 1 year ago

I started syncing my node with a clean state. I stopped and started it twice.

Now, I don't see any progress anymore. There is no error in the logs. No new log entries for hours. What is happening.

The web server is not running.

It looks like the node can talk to my local bitcoin node.

stacks-blockchain        | INFO [1661836575.781383] [src/burnchains/burnchain.rs:1314] [main] Syncing Bitcoin blocks: 99.2% (751083 to 751084 out of 751754)

Last two entries:

stacks-blockchain-api    | {"level":"http","message":"HTTP POST /new_block","req":{"headers":{"content-length":"170958","content-type":"application/json","host":"stacks-blockchain-api:3700"},"httpVersion":"1.1","method":"POST","originalUrl":"/new_block","query":{},"url":"/new_block"},"res":{"statusCode":200},"responseTime":4867,"timestamp":"2022-08-30T05:18:49.191Z"}
stacks-blockchain        | INFO [1661836729.256681] [src/chainstate/coordinator/mod.rs:774] [chains-coordinator] Atlas: 2 attachment instances emitted from events
wileyj commented 1 year ago

Hard to say based on these logs, i would first enable DEBUG logging in the stacks-blockchain by changing this to 1 in your .env file: https://github.com/stacks-network/stacks-blockchain-docker/blob/master/sample.env#L34

the other options i'm thinking of here -

  1. did you force kill the stacks-blockchain container at any point during this restart, or did it time out trying to shutdown gracefully (default timeout is set for 20 minutes - in testing, i've only seen it go to 15 minutes or so one time).
  2. the machine specs itself - i wouldn't expect it to pause operations for this long (looks like > 12 hours at this point), but an overwhelmed machine (high CPU load due to IO) could be something to look into
  3. it may be an issue with the docker daemon itself, but this would likely not be the case. restarting the docker daemon/updating to the latest version should resolve if it's docker itself.

i would definitely try step 3 regardless, but also enable debug logging followed by a restart - and then let me know if anything else comes up in the logs.

wileyj commented 1 year ago

Forgot to add - one other thing along step 2 above is check your disk usage. i've seen strange things when a host runs out of disk

friedger commented 1 year ago

@wileyj Some comments:

  1. I can't say whether it was timed out or shut down correctly. The computer might have gone into suspend mode.
  2. good spec 32 GB ram, latest CPU, large ssd, computer was newly setup
  3. Docker version 20.10.17, build 100c701

Attached two log files logs.txt logs.txt

wileyj commented 1 year ago

~Hmm, nothing in the logs showing what the problem may be.~ ~This node has been running for a few hours now based on this log entry: stacks-blockchain | DEBG [1661884895.689334] [src/chainstate/coordinator/mod.rs:733] [chains-coordinator] Bump blocks processed~

~have the logs stalled again?~

Just saw your second logfile - this one looks much better than the previous, is the tip height progressing? you also mentioned something interesting that i think should be looked at a little more. if the laptop is going to sleep, i can see how the processes here would appear to stall - i'll have to try this myself, i've never let a machine sleep while a blockchain was running, but it could be what caused the behaviour you saw.

friedger commented 1 year ago

@wileyj Yes, this is the last entry from 10 minutes ago.

stacks-blockchain-api    | {"in_microblock":true,"level":"info","message":"Transaction confirmed","stacks_height":48267,"timestamp":"2022-08-30T20:31:54.411Z","txid":"0xdcd666b979f792ccb56a07e6475951ccdf7ef36911f14c8e9e35574ff1a9b09a"}
stacks-blockchain-api    | {"level":"http","message":"HTTP POST /new_block","req":{"headers":{"content-length":"39927","content-type":"application/json","host":"stacks-blockchain-api:3700"},"httpVersion":"1.1","method":"POST","originalUrl":"/new_block","query":{},"url":"/new_block"},"res":{"statusCode":200},"responseTime":5026,"timestamp":"2022-08-30T20:31:59.433Z"}
postgres                 | 
postgres                 | PostgreSQL Database directory appears to contain a database; Skipping initialization
postgres                 | 
postgres                 | 2022-08-30 20:29:05.350 UTC [1] LOG:  starting PostgreSQL 14.5 on x86_64-pc-linux-musl, compiled by gcc (Alpine 11.2.1_git20220219) 11.2.1 20220219, 64-bit
postgres                 | 2022-08-30 20:29:05.350 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
postgres                 | 2022-08-30 20:29:05.350 UTC [1] LOG:  listening on IPv6 address "::", port 5432
postgres                 | 2022-08-30 20:29:05.352 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
postgres                 | 2022-08-30 20:29:05.354 UTC [21] LOG:  database system was shut down at 2022-08-30 20:28:57 UTC
postgres                 | 2022-08-30 20:29:05.356 UTC [1] LOG:  database system is ready to accept connections
wileyj commented 1 year ago

My main concern is how docker would handle suspending the computer (mainly the disk) while there were in progress DB writes in the blockchain. And also what would happen when the machine wakes up.

The only other times i've seen the blockchain stop progressing is either

  1. the API is having issues processing events (the blockchain would still attempt to send the events though, creating a log entry)
  2. the process was not shutdown gracefully, so there were unwritten records resulting in a corrupt chainstate (this would also be logged though)
wileyj commented 1 year ago

This is a PR'ed bug in the stacks-blockchain fixed here: https://github.com/stacks-network/stacks-blockchain/pull/3236