paritytech / polkadot-sdk

The Parity Polkadot Blockchain SDK
https://polkadot.network/
1.78k stars 640 forks source link

Slow block import #13

Open purestakeoskar opened 1 year ago

purestakeoskar commented 1 year ago

Is there an existing issue?

Experiencing problems? Have you tried our Stack Exchange first?

Description of bug

When syncing a node the block import is very slow, to the point where block production is faster than block import. Instead of sync the logs shows Preparing.

2023-04-17 14:41:30.389  INFO tokio-runtime-worker substrate: [🌗] ⚙️  Preparing  0.0 bps, target=#3373001 (31 peers), best: #3372537 (0x41ba…81b4), finalized #3372219 (0x602b…446f), ⬇ 35.2kiB/s ⬆ 4.8kiB/s
2023-04-17 14:41:34.478 DEBUG tokio-runtime-worker runtime::system: [🌗] [3372538] 0 extrinsics, length: 10962 (normal 0%, op: 0%, mandatory 0%) / normal weight:Weight(ref_time: 265621300000, proof_size: 0) (70%) op weight Weight(ref_time: 0, proof_size: 0) (0%) / mandatory weight Weight(ref_time: 7235415758, proof_size: 0) (0%)
2023-04-17 14:41:34.500 TRACE tokio-runtime-worker sync::import-queue: [🌗] Block imported successfully Some(3372538) (0x10fe…ec7d)
2023-04-17 14:41:34.500 TRACE tokio-runtime-worker sync::import-queue: [🌗] Header 0xb8de…8fa7 has 4 logs
2023-04-17 14:41:35.389  INFO tokio-runtime-worker substrate: [🌗] ⚙️  Preparing  0.2 bps, target=#3373001 (31 peers), best: #3372538 (0x10fe…ec7d), finalized #3372219 (0x602b…446f), ⬇ 3.8kiB/s ⬆ 4.8kiB/s
2023-04-17 14:41:38.733 TRACE tokio-runtime-worker sync::import-queue: [🌗] Scheduling 1 blocks for import
2023-04-17 14:41:40.389  INFO tokio-runtime-worker substrate: [🌗] ⚙️  Preparing  0.0 bps, target=#3373002 (31 peers), best: #3372538 (0x10fe…ec7d), finalized #3372219 (0x602b…446f), ⬇ 22.6kiB/s ⬆ 3.9kiB/s
2023-04-17 14:41:45.389  INFO tokio-runtime-worker substrate: [🌗] ⚙️  Preparing  0.0 bps, target=#3373002 (31 peers), best: #3372538 (0x10fe…ec7d), finalized #3372219 (0x602b…446f), ⬇ 7.0kiB/s ⬆ 3.8kiB/s
2023-04-17 14:41:50.389  INFO tokio-runtime-worker substrate: [🌗] ⚙️  Preparing  0.0 bps, target=#3373002 (31 peers), best: #3372538 (0x10fe…ec7d), finalized #3372219 (0x602b…446f), ⬇ 4.2kiB/s ⬆ 4.5kiB/s
2023-04-17 14:41:51.372 TRACE tokio-runtime-worker sync::import-queue: [🌗] Scheduling 1 blocks for import
2023-04-17 14:41:53.983 DEBUG tokio-runtime-worker runtime::system: [🌗] [3372539] 0 extrinsics, length: 45566 (normal 1%, op: 0%, mandatory 0%) / normal weight:Weight(ref_time: 357569150000, proof_size: 0) (95%) op weight Weight(ref_time: 0, proof_size: 0) (0%) / mandatory weight Weight(ref_time: 7235415758, proof_size: 0) (0%)
2023-04-17 14:41:54.008 TRACE tokio-runtime-worker sync::import-queue: [🌗] Block imported successfully Some(3372539) (0xb8de…8fa7)
2023-04-17 14:41:54.008 TRACE tokio-runtime-worker sync::import-queue: [🌗] Header 0x1835…434d has 4 logs
2023-04-17 14:41:55.389  INFO tokio-runtime-worker substrate: [🌗] ⚙️  Preparing  0.2 bps, target=#3373003 (31 peers), best: #3372539 (0xb8de…8fa7), finalized #3372219 (0x602b…446f), ⬇ 11.9kiB/s ⬆ 3.2kiB/s
2023-04-17 14:42:00.390  INFO tokio-runtime-worker substrate: [🌗] ⚙️  Preparing  0.0 bps, target=#3373003 (31 peers), best: #3372539 (0xb8de…8fa7), finalized #3372219 (0x602b…446f), ⬇ 10.7kiB/s ⬆ 4.1kiB/s
2023-04-17 14:42:00.390  WARN tokio-runtime-worker telemetry: [🌗] ❌ Error while dialing /dns/telemetry.polkadot.io/tcp/443/x-parity-wss/%2Fsubmit%2F: Custom { kind: Other, error: Timeout }
2023-04-17 14:42:05.390  INFO tokio-runtime-worker substrate: [🌗] ⚙️  Preparing  0.0 bps, target=#3373003 (31 peers), best: #3372539 (0xb8de…8fa7), finalized #3372219 (0x602b…446f), ⬇ 11.4kiB/s ⬆ 3.0kiB/s
2023-04-17 14:42:06.384 TRACE tokio-runtime-worker sync::import-queue: [🌗] Scheduling 1 blocks for import
2023-04-17 14:42:09.607 DEBUG tokio-runtime-worker runtime::system: [🌗] [3372540] 0 extrinsics, length: 26287 (normal 0%, op: 0%, mandatory 0%) / normal weight:Weight(ref_time: 327287250000, proof_size: 0) (87%) op weight Weight(ref_time: 0, proof_size: 0) (0%) / mandatory weight Weight(ref_time: 7235415758, proof_size: 0) (0%)
2023-04-17 14:42:09.632 TRACE tokio-runtime-worker sync::import-queue: [🌗] Block imported successfully Some(3372540) (0x1835…434d)
2023-04-17 14:42:09.632 TRACE tokio-runtime-worker sync::import-queue: [🌗] Header 0x620c…caa3 has 4 logs
2023-04-17 14:42:10.390  INFO tokio-runtime-worker substrate: [🌗] ⚙️  Preparing  0.2 bps, target=#3373004 (31 peers), best: #3372540 (0x1835…434d), finalized #3372219 (0x602b…446f), ⬇ 29.9kiB/s ⬆ 4.6kiB/s
2023-04-17 14:42:15.390  INFO tokio-runtime-worker substrate: [🌗] ⚙️  Preparing  0.0 bps, target=#3373004 (31 peers), best: #3372540 (0x1835…434d), finalized #3372219 (0x602b…446f), ⬇ 20.8kiB/s ⬆ 4.2kiB/s

The node is connected to peers with blocks we need.

2023-04-17 14:45:38.109  INFO tokio-runtime-worker substrate: [🌗] ⚙️  Preparing  0.0 bps, target=#3373020 (18 peers), best: #3372550 (0x18a5…4ca5), finalized #3372219 (0x602b…446f), ⬇ 5.2kiB/s ⬆ 2.3kiB/s
2023-04-17 14:45:39.845 TRACE tokio-runtime-worker sync: [🌗] New peer 12D3KooWH3nhNXgsiPhVREEcrbbWwVyNfuLnwfrgwdkTPKjtHS2N BlockAnnouncesHandshake { roles: FULL, best_number: 3373019, best_hash: 0x2bb9696090704a6ca516b482a1661177894ba7a63f1dffa3d958edb2d19a980f, genesis_hash: 0xfe58ea77779b7abda7da4ec526d14db9b1e9cd40a217c34892af80a9b332b76d }
2023-04-17 14:45:39.845 DEBUG tokio-runtime-worker sync: [🌗] New peer with known best hash 0x2bb9…980f (3373019).
2023-04-17 14:45:39.845 DEBUG tokio-runtime-worker sync: [🌗] Connected 12D3KooWH3nhNXgsiPhVREEcrbbWwVyNfuLnwfrgwdkTPKjtHS2N
2023-04-17 14:45:41.464 TRACE tokio-runtime-worker sync: [🌗] New peer 12D3KooWGNoqQTFj92X8S2x42uZFexUQRQFeLDiWzarJmXJBV42F BlockAnnouncesHandshake { roles: FULL, best_number: 3373019, best_hash: 0x2bb9696090704a6ca516b482a1661177894ba7a63f1dffa3d958edb2d19a980f, genesis_hash: 0xfe58ea77779b7abda7da4ec526d14db9b1e9cd40a217c34892af80a9b332b76d }
2023-04-17 14:45:41.464 DEBUG tokio-runtime-worker sync: [🌗] New peer with known best hash 0x2bb9…980f (3373019).
2023-04-17 14:45:41.464 DEBUG tokio-runtime-worker sync: [🌗] Connected 12D3KooWGNoqQTFj92X8S2x42uZFexUQRQFeLDiWzarJmXJBV42F
2023-04-17 14:45:41.474 TRACE tokio-runtime-worker sync: [🌗] 12D3KooWGNoqQTFj92X8S2x42uZFexUQRQFeLDiWzarJmXJBV42F Ignoring transactions while major syncing

There are also queued synced blocks (sync_queued_blocks metric) image

Another interesting note is that the node does not know how far it needs to sync. Its sync_target is equal to best_block image

Steps to reproduce

Start syncing a moonbem or moonriver node with archive parachain and pruned relay chain We are running moonbeam version 0.30.3 ( using polkadot 0.9.37 ) Main flags used

--execution=wasm
--pruning=archive
--rpc-cors=all
--unsafe-rpc-external
--unsafe-ws-external
--rpc-methods=safe
--db-cache=1024
--trie-cache-size=1073741824
--runtime-cache-size=32
--eth-log-block-cache=1000000000
--eth-statuses-cache=1000000000
--detailed-log-output
--no-hardware-benchmarks
--ws-max-connections=800
altonen commented 1 year ago

Can you run the node with -lsync=trace? Are you able to reproduce this issue reliable every time you run the node?

purestakeoskar commented 1 year ago

@altonen thanks for your response. Yes this i reproducible, every time i restart it goes back to the same state and it happens on multiple nodes. This is 5min of sync=trace logs: moonbeam-sync-trace.log

altonen commented 1 year ago

From the logs it looks like there are no block announcements coming for the relay chain so that could possibly explain why the target block is not progressing. And since for most peers you're connected the best block is not progressing because either a) they're major syncing as well and don't announce blocks or b) the inbound /block-announces/1 is closed and it's not detected by your node, your node is making very little progress.

Why it takes a minute between importing block 15159517 and 15159518 is weird and I don't know why that would be happening, any idea @bkchr?

purestakeoskar commented 1 year ago

Just for clarification, the relay chain in question is Polkadot.

bkchr commented 1 year ago

Why it takes a minute between importing block 15159517 and 15159518 is weird and I don't know why that would be happening, any idea @bkchr?

No. It started directly the checking of the header of 15159518, but as you said it took over a minute to import it. Not sure what happens here.

altonen commented 1 year ago

@purestakeoskar is this reproducible for you on v0.9.36?

bLd75 commented 1 year ago

Bringing in we have the same problem on Astar with an http endpoint which is heavily queried, tested on both 9933 and 9944. The abnormal behavior strangely feels like more than 1 year ago when RPC nodes started to lose sync while heavily queried. CPU usage is > 60%. Client is on v0.9.39 and runtime on v0.9.37

purestakeoskar commented 1 year ago

@purestakeoskar is this reproducible for you on v0.9.36?

I do not have a client with v0.9.36 to test with unfortunately. Is there any change in particular in 0.9.37 that stands out?

skunert commented 1 year ago

@purestakeoskar Does this node use paritydb or rocksdb? And does this problem persist forever or are block imports normal again after some time? Sync mode normal? I want to reproduce this.

purestakeoskar commented 1 year ago

@skunert The node is using rocksdb, and the blocks does slowly import, some faster than others, the syncing does eventually finish and after that the node follows the chain.

Another interesting note (this might be a separate issue). when the node restarts the grandpa sessions takes a while to be established on the relay side, so we dont see blocks finalizing for sometimes up to 15 minutes. But then when the grandpa notification streams are open we start to see the finalization moving again. During this time block import still happens.

Also to note, we have the same behaviour when we use external RPC for the relay chain.

jasl commented 1 year ago

@purestakeoskar Does this node use paritydb or rocksdb? And does this problem persist forever or are block imports normal again after some time? Sync mode normal? I want to reproduce this.

Perhaps you can reproduce the issue use Khala network, we're on the latest Polkadot v0.9.41, can both reproduce no matter ParityDB or RocksDB

docker run -dti --name khala-node \
    -e NODE_NAME=khala-node \
    -e NODE_ROLE=FULL \
    -e PARACHAIN_DB="paritydb" \
    -e RELAYCHAIN_DB="paritydb" \
    -e PARACHAIN_EXTRA_ARGS='--ws-max-connections 512 --rpc-max-response-size 64 --runtime-cache-size 16 --no-telemetry' \
    -e RELAYCHAIN_EXTRA_ARGS='--ws-max-connections 512 --rpc-max-response-size 64 --runtime-cache-size 16 --no-telemetry' \
    --network host \
    --restart always \
    --log-driver local --log-opt max-size=100m --log-opt max-file=3 \
    -v /data/khala-node:/root/data \
    phalanetwork/khala-node-with-launcher:v0.1.24-1

You should see a significant sync speed regression when the progress reaches ~50%

Some users will experience the sync speed reduction to 0 bps (both Parachain and Relaychain), some lucky users will have ~ 50bps (in my experience)

jasl commented 1 year ago

We left some comments on https://github.com/paritytech/substrate/issues/9360

arkpar commented 1 year ago

You can use https://github.com/koute/not-perf for profiling

# once the node is stuck and shows 0 bps with busy CPU, let nperf run for ~20 seconds and quit with Ctrl-C
nperf record -p $PID_OF_NODE_PROCESS -o datafile

nperf flamegraph datafile > flame.svg
jasl commented 1 year ago

@arkpar Here it is, the node uses the default arguments, recording when sync is slow

I see [2023-05-02T23:42:11Z ERROR nperf_core::profiler] Failed to load '/memfd:wasm-memory-image (deleted)' from "/memfd:wasm-memory-image (deleted)": No such file or directory (os error 2) not sure it impacts the result

sync2

sync

flaminggraph.zip

arkpar commented 1 year ago

There seem to be a lot of time wasted in cumulus_client_consensus_common::parachain_consensus::follow_finalized_head In particular, there's a lot of header queries that don't hit cache for some reason. @bkchr

bkchr commented 1 year ago

Yeah, @skunert is already working on this: https://github.com/paritytech/cumulus/issues/2495

skunert commented 1 year ago

Yes, the problem are finality notifications that reach the node while syncing when the finalized block is far away from the tip of the chain.

For each of these finalization attempts, we calculate multiple tree routes. One from last finalized -> new finalized and one from new_finalized -> best_block. This last route is very long in the collator case and takes up to 15 seconds on my machine. During this time, the import lock is taken, and no other import operations are possible. We do not hit the cache since the distance between finalized and best can be millions of blocks.

I see the following options:

  1. Cache the tree route from finalized -> best block and verify that the new finalized block is on the correct route (looking into this currently)
  2. Move tree route computation out of the import lock for finalization and verify only that this computed route is still valid
  3. Disable collator finalization during major sync. This is the easiest fix but will lead to a multi-minute finalization operation from genesis to tip once fully synced.

@arkpar Any opinions on this?

arkpar commented 1 year ago

So this looks like https://github.com/paritytech/substrate/issues/9360, p.2

For each of these finalization attempts, we calculate multiple tree routes. One from last finalized -> new finalized and one from new_finalized -> best_block.

What is that last route used for? Maybe we can make some assumptions if the difference in block numbers is over some threshold?

Disable collator finalization during major sync. This is the easiest fix but will lead to a multi-minute finalization operation from genesis to tip once fully synced.

In theory, we can finalize any chain in constant time, without having to check all the intermediate blocks. IIRC we only query tree routes to get a list of discarded blocks and actually discard them. This list should be empty after the initial sync anyway. Implementing this at the database level could be a bit involved. state-db in particular requires finalizing blocks one at a time, with the exception of warp sync.

bkchr commented 1 year ago

We could check the number of leaves. If the number of leaves is one, we don't need to calculate the tree route as there isn't any other chain. This should also hold for the initial sync and should give us the speed up we need.

skunert commented 1 year ago

We could check the number of leaves. If the number of leaves is one, we don't need to calculate the tree route as there isn't any other chain. This should also hold for the initial sync and should give us the speed up we need.

This sounds reasonable, will give it a try.

arkpar commented 1 year ago

We could check the number of leaves. If the number of leaves is one, we don't need to calculate the tree route as there isn't any other chain. This should also hold for the initial sync and should give us the speed up we need.

This sounds reasonable for this particular issue, but it won't fix the problem reported in https://github.com/paritytech/cumulus/issues/2495 There will still be a million blocks finalized at once.

skunert commented 1 year ago

We could check the number of leaves. If the number of leaves is one, we don't need to calculate the tree route as there isn't any other chain. This should also hold for the initial sync and should give us the speed up we need.

This sounds reasonable for this particular issue, but it won't fix the problem reported in paritytech/cumulus#2495 There will still be a million blocks finalized at once.

True, but this issue here is particularly annoying since this blocks the entire syncing. https://github.com/paritytech/cumulus/issues/2495 is for sure still problematic but will only stall the node for a few minutes before it continues operation normally. We should look into further improvements.

skunert commented 1 year ago

This change is now in cumulus, closing here.

jasl commented 1 year ago

I just made a new node based on polkadot-v0.9.42 and did a resync on my Azure VM (which stuck sync when reaching ~50% progress), Unfortunately, I still see the node stuck

2023-05-14 13:27:50 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17909361 (40 peers), best: #2732144 (0x618b…c94a), finalized #2732032 (0x0071…6b63), ⬇ 367.0kiB/s ⬆ 260.9kiB/s
2023-05-14 13:27:53 [Parachain] ⚙️  Syncing  0.0 bps, target=#3885948 (40 peers), best: #689249 (0x1ad0…a551), finalized #0 (0xd435…be8d), ⬇ 5.5kiB/s ⬆ 6.0kiB/s
2023-05-14 13:27:55 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17909361 (40 peers), best: #2732144 (0x618b…c94a), finalized #2732032 (0x0071…6b63), ⬇ 303.7kiB/s ⬆ 229.6kiB/s
2023-05-14 13:27:58 [Parachain] ⚙️  Syncing  0.0 bps, target=#3885949 (39 peers), best: #689249 (0x1ad0…a551), finalized #0 (0xd435…be8d), ⬇ 30.4kiB/s ⬆ 6.0kiB/s
2023-05-14 13:28:00 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17909362 (40 peers), best: #2732144 (0x618b…c94a), finalized #2732032 (0x0071…6b63), ⬇ 598.7kiB/s ⬆ 222.9kiB/s
2023-05-14 13:28:03 [Parachain] ⚙️  Syncing  0.0 bps, target=#3885949 (40 peers), best: #689249 (0x1ad0…a551), finalized #0 (0xd435…be8d), ⬇ 11.1kiB/s ⬆ 7.3kiB/s
2023-05-14 13:28:05 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17909363 (40 peers), best: #2732144 (0x618b…c94a), finalized #2732032 (0x0071…6b63), ⬇ 208.4kiB/s ⬆ 194.0kiB/s
2023-05-14 13:28:08 [Parachain] ⚙️  Syncing  0.0 bps, target=#3885949 (40 peers), best: #689249 (0x1ad0…a551), finalized #0 (0xd435…be8d), ⬇ 43.7kiB/s ⬆ 9.3kiB/s
2023-05-14 13:28:10 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17909364 (40 peers), best: #2732144 (0x618b…c94a), finalized #2732032 (0x0071…6b63), ⬇ 284.6kiB/s ⬆ 183.3kiB/s
2023-05-14 13:28:13 [Parachain] ⚙️  Syncing  0.0 bps, target=#3885949 (39 peers), best: #689249 (0x1ad0…a551), finalized #0 (0xd435…be8d), ⬇ 27.1kiB/s ⬆ 5.6kiB/s
2023-05-14 13:28:15 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17909365 (40 peers), best: #2732144 (0x618b…c94a), finalized #2732032 (0x0071…6b63), ⬇ 230.8kiB/s ⬆ 202.6kiB/s
2023-05-14 13:28:18 [Parachain] ⚙️  Syncing  0.0 bps, target=#3885949 (40 peers), best: #689249 (0x1ad0…a551), finalized #0 (0xd435…be8d), ⬇ 34.1kiB/s ⬆ 8.1kiB/s
2023-05-14 13:28:20 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17909366 (40 peers), best: #2732144 (0x618b…c94a), finalized #2732032 (0x0071…6b63), ⬇ 285.0kiB/s ⬆ 232.5kiB/s
2023-05-14 13:28:23 [Parachain] ⚙️  Syncing  0.0 bps, target=#3885949 (40 peers), best: #689249 (0x1ad0…a551), finalized #0 (0xd435…be8d), ⬇ 8.2kiB/s ⬆ 6.7kiB/s
2023-05-14 13:28:25 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17909367 (40 peers), best: #2732144 (0x618b…c94a), finalized #2732032 (0x0071…6b63), ⬇ 269.0kiB/s ⬆ 225.0kiB/s
2023-05-14 13:28:28 [Parachain] ⚙️  Syncing  0.0 bps, target=#3885949 (40 peers), best: #689249 (0x1ad0…a551), finalized #0 (0xd435…be8d), ⬇ 42.2kiB/s ⬆ 10.2kiB/s
2023-05-14 13:28:30 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17909368 (40 peers), best: #2732144 (0x618b…c94a), finalized #2732032 (0x0071…6b63), ⬇ 252.5kiB/s ⬆ 211.8kiB/s

I'll try record nperf tomorrow

BTW, I see that node can't stop gracefully (so does previous)

docker stop khala-node
Error response from daemon: cannot stop container: khala-node: tried to kill container, but did not receive an exit event
jasl commented 1 year ago

syncing syncing2 syncing_flaming_graph.zip

Here are flaming graphs (1 recording for 1 minute, 2 is a short capture ~20s), on my Azure VM with premium SSD which I meet stuck syncing problem.

The node uses the latest parity-db and Polkadot v0.9.42

jasl commented 1 year ago

@skunert could you take a look? it seems after upgrading polkadot v0.9.42 cumulus still slow

skunert commented 1 year ago

@jasl Will take a look! Can you share the logs before the slow syncing starts?

@purestakeoskar Does this node use paritydb or rocksdb? And does this problem persist forever or are block imports normal again after some time? Sync mode normal? I want to reproduce this.

Perhaps you can reproduce the issue use Khala network, we're on the latest Polkadot v0.9.41, can both reproduce no matter ParityDB or RocksDB

docker run -dti --name khala-node \
  -e NODE_NAME=khala-node \
  -e NODE_ROLE=FULL \
  -e PARACHAIN_DB="paritydb" \
  -e RELAYCHAIN_DB="paritydb" \
  -e PARACHAIN_EXTRA_ARGS='--ws-max-connections 512 --rpc-max-response-size 64 --runtime-cache-size 16 --no-telemetry' \
  -e RELAYCHAIN_EXTRA_ARGS='--ws-max-connections 512 --rpc-max-response-size 64 --runtime-cache-size 16 --no-telemetry' \
  --network host \
  --restart always \
  --log-driver local --log-opt max-size=100m --log-opt max-file=3 \
  -v /data/khala-node:/root/data \
  phalanetwork/khala-node-with-launcher:v0.1.24-1

You should see a significant sync speed regression when the progress reaches ~50%

Some users will experience the sync speed reduction to 0 bps (both Parachain and Relaychain), some lucky users will have ~ 50bps (in my experience)

This is the command you used still?

jasl commented 1 year ago

@jasl Will take a look! Can you share the logs before the slow syncing starts?

@purestakeoskar Does this node use paritydb or rocksdb? And does this problem persist forever or are block imports normal again after some time? Sync mode normal? I want to reproduce this.

Perhaps you can reproduce the issue use Khala network, we're on the latest Polkadot v0.9.41, can both reproduce no matter ParityDB or RocksDB

docker run -dti --name khala-node \
    -e NODE_NAME=khala-node \
    -e NODE_ROLE=FULL \
    -e PARACHAIN_DB="paritydb" \
    -e RELAYCHAIN_DB="paritydb" \
    -e PARACHAIN_EXTRA_ARGS='--ws-max-connections 512 --rpc-max-response-size 64 --runtime-cache-size 16 --no-telemetry' \
    -e RELAYCHAIN_EXTRA_ARGS='--ws-max-connections 512 --rpc-max-response-size 64 --runtime-cache-size 16 --no-telemetry' \
    --network host \
    --restart always \
    --log-driver local --log-opt max-size=100m --log-opt max-file=3 \
    -v /data/khala-node:/root/data \
    phalanetwork/khala-node-with-launcher:v0.1.24-1

You should see a significant sync speed regression when the progress reaches ~50% Some users will experience the sync speed reduction to 0 bps (both Parachain and Relaychain), some lucky users will have ~ 50bps (in my experience)

This is the command you used still?

docker run -dti --name khala-node \
    -e NODE_NAME=khala-node \
    -e NODE_ROLE=MINER \
    -e PARACHAIN_DB="paritydb" \
    -e RELAYCHAIN_DB="paritydb" \
    -e PARACHAIN_EXTRA_ARGS='--ws-max-connections 512 --rpc-max-response-size 64 --runtime-cache-size 16 --no-telemetry' \
    -e RELAYCHAIN_EXTRA_ARGS='--ws-max-connections 512 --rpc-max-response-size 64 --runtime-cache-size 16 --no-telemetry' \
    --network host \
    --restart always \
    --log-driver local --log-opt max-size=100m --log-opt max-file=3 \
    -v /data/khala-node:/root/data \
    jasl123/khala-node-with-launcher:v0.1.25-dev.1

yeah but the docker image switch to jasl123/khala-node-with-launcher:v0.1.25-dev.1 this is the latest dev build

skunert commented 1 year ago

syncing syncing2 syncing_flaming_graph.zip

Here are flaming graphs (1 recording for 1 minute, 2 is a short capture ~20s), on my Azure VM with premium SSD which I meet stuck syncing problem.

The node uses the latest parity-db and Polkadot v0.9.42

This looks different than last time, we are not spending time on finalization, but looks like a networking issue.

jasl commented 1 year ago

syncing syncing2 syncing_flaming_graph.zip Here are flaming graphs (1 recording for 1 minute, 2 is a short capture ~20s), on my Azure VM with premium SSD which I meet stuck syncing problem. The node uses the latest parity-db and Polkadot v0.9.42

This looks different than last time, we are not spending time on finalization, but looks like a networking issue.

yeah, I see the pattern change, but I don't have knowledge about Substrate low level...

jasl commented 1 year ago

Could you help to identify if the issue is my VM network or relates to the Substrate code? my colleague also meet 0bps in his node (running in his home)

skunert commented 1 year ago

Yes sure, this was just an observation, we are working on it. I was able to reproduce the slow syncing and asked colleagues for help who are more familiar with our networking code :+1:

jasl commented 1 year ago

Yes sure, this was just an observation, we are working on it. I was able to reproduce the slow syncing and asked colleagues for help who are more familiar with our networking code 👍

Thank you for your help!

altonen commented 1 year ago

I just made a new node based on polkadot-v0.9.42 and did a resync on my Azure VM (which stuck sync when reaching ~50% progress), Unfortunately, I still see the node stuck

2023-05-14 13:27:50 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17909361 (40 peers), best: #2732144 (0x618b…c94a), finalized #2732032 (0x0071…6b63), ⬇ 367.0kiB/s ⬆ 260.9kiB/s
2023-05-14 13:27:53 [Parachain] ⚙️  Syncing  0.0 bps, target=#3885948 (40 peers), best: #689249 (0x1ad0…a551), finalized #0 (0xd435…be8d), ⬇ 5.5kiB/s ⬆ 6.0kiB/s
2023-05-14 13:27:55 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17909361 (40 peers), best: #2732144 (0x618b…c94a), finalized #2732032 (0x0071…6b63), ⬇ 303.7kiB/s ⬆ 229.6kiB/s
2023-05-14 13:27:58 [Parachain] ⚙️  Syncing  0.0 bps, target=#3885949 (39 peers), best: #689249 (0x1ad0…a551), finalized #0 (0xd435…be8d), ⬇ 30.4kiB/s ⬆ 6.0kiB/s
2023-05-14 13:28:00 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17909362 (40 peers), best: #2732144 (0x618b…c94a), finalized #2732032 (0x0071…6b63), ⬇ 598.7kiB/s ⬆ 222.9kiB/s
2023-05-14 13:28:03 [Parachain] ⚙️  Syncing  0.0 bps, target=#3885949 (40 peers), best: #689249 (0x1ad0…a551), finalized #0 (0xd435…be8d), ⬇ 11.1kiB/s ⬆ 7.3kiB/s
2023-05-14 13:28:05 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17909363 (40 peers), best: #2732144 (0x618b…c94a), finalized #2732032 (0x0071…6b63), ⬇ 208.4kiB/s ⬆ 194.0kiB/s
2023-05-14 13:28:08 [Parachain] ⚙️  Syncing  0.0 bps, target=#3885949 (40 peers), best: #689249 (0x1ad0…a551), finalized #0 (0xd435…be8d), ⬇ 43.7kiB/s ⬆ 9.3kiB/s
2023-05-14 13:28:10 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17909364 (40 peers), best: #2732144 (0x618b…c94a), finalized #2732032 (0x0071…6b63), ⬇ 284.6kiB/s ⬆ 183.3kiB/s
2023-05-14 13:28:13 [Parachain] ⚙️  Syncing  0.0 bps, target=#3885949 (39 peers), best: #689249 (0x1ad0…a551), finalized #0 (0xd435…be8d), ⬇ 27.1kiB/s ⬆ 5.6kiB/s
2023-05-14 13:28:15 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17909365 (40 peers), best: #2732144 (0x618b…c94a), finalized #2732032 (0x0071…6b63), ⬇ 230.8kiB/s ⬆ 202.6kiB/s
2023-05-14 13:28:18 [Parachain] ⚙️  Syncing  0.0 bps, target=#3885949 (40 peers), best: #689249 (0x1ad0…a551), finalized #0 (0xd435…be8d), ⬇ 34.1kiB/s ⬆ 8.1kiB/s
2023-05-14 13:28:20 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17909366 (40 peers), best: #2732144 (0x618b…c94a), finalized #2732032 (0x0071…6b63), ⬇ 285.0kiB/s ⬆ 232.5kiB/s
2023-05-14 13:28:23 [Parachain] ⚙️  Syncing  0.0 bps, target=#3885949 (40 peers), best: #689249 (0x1ad0…a551), finalized #0 (0xd435…be8d), ⬇ 8.2kiB/s ⬆ 6.7kiB/s
2023-05-14 13:28:25 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17909367 (40 peers), best: #2732144 (0x618b…c94a), finalized #2732032 (0x0071…6b63), ⬇ 269.0kiB/s ⬆ 225.0kiB/s
2023-05-14 13:28:28 [Parachain] ⚙️  Syncing  0.0 bps, target=#3885949 (40 peers), best: #689249 (0x1ad0…a551), finalized #0 (0xd435…be8d), ⬇ 42.2kiB/s ⬆ 10.2kiB/s
2023-05-14 13:28:30 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17909368 (40 peers), best: #2732144 (0x618b…c94a), finalized #2732032 (0x0071…6b63), ⬇ 252.5kiB/s ⬆ 211.8kiB/s

I'll try record nperf tomorrow

BTW, I see that node can't stop gracefully (so does previous)

docker stop khala-node
Error response from daemon: cannot stop container: khala-node: tried to kill container, but did not receive an exit event

@jasl can you run the node with -lsync=trace when both the relay and parachain are getting stuck at 0.0 bps?

edit: if you decrease the number of blocks per request to 8 or 16 with --max-blocks-per-request 16 is it able to make progress then?

jasl commented 1 year ago

I just made a new node based on polkadot-v0.9.42 and did a resync on my Azure VM (which stuck sync when reaching ~50% progress), Unfortunately, I still see the node stuck

2023-05-14 13:27:50 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17909361 (40 peers), best: #2732144 (0x618b…c94a), finalized #2732032 (0x0071…6b63), ⬇ 367.0kiB/s ⬆ 260.9kiB/s
2023-05-14 13:27:53 [Parachain] ⚙️  Syncing  0.0 bps, target=#3885948 (40 peers), best: #689249 (0x1ad0…a551), finalized #0 (0xd435…be8d), ⬇ 5.5kiB/s ⬆ 6.0kiB/s
2023-05-14 13:27:55 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17909361 (40 peers), best: #2732144 (0x618b…c94a), finalized #2732032 (0x0071…6b63), ⬇ 303.7kiB/s ⬆ 229.6kiB/s
2023-05-14 13:27:58 [Parachain] ⚙️  Syncing  0.0 bps, target=#3885949 (39 peers), best: #689249 (0x1ad0…a551), finalized #0 (0xd435…be8d), ⬇ 30.4kiB/s ⬆ 6.0kiB/s
2023-05-14 13:28:00 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17909362 (40 peers), best: #2732144 (0x618b…c94a), finalized #2732032 (0x0071…6b63), ⬇ 598.7kiB/s ⬆ 222.9kiB/s
2023-05-14 13:28:03 [Parachain] ⚙️  Syncing  0.0 bps, target=#3885949 (40 peers), best: #689249 (0x1ad0…a551), finalized #0 (0xd435…be8d), ⬇ 11.1kiB/s ⬆ 7.3kiB/s
2023-05-14 13:28:05 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17909363 (40 peers), best: #2732144 (0x618b…c94a), finalized #2732032 (0x0071…6b63), ⬇ 208.4kiB/s ⬆ 194.0kiB/s
2023-05-14 13:28:08 [Parachain] ⚙️  Syncing  0.0 bps, target=#3885949 (40 peers), best: #689249 (0x1ad0…a551), finalized #0 (0xd435…be8d), ⬇ 43.7kiB/s ⬆ 9.3kiB/s
2023-05-14 13:28:10 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17909364 (40 peers), best: #2732144 (0x618b…c94a), finalized #2732032 (0x0071…6b63), ⬇ 284.6kiB/s ⬆ 183.3kiB/s
2023-05-14 13:28:13 [Parachain] ⚙️  Syncing  0.0 bps, target=#3885949 (39 peers), best: #689249 (0x1ad0…a551), finalized #0 (0xd435…be8d), ⬇ 27.1kiB/s ⬆ 5.6kiB/s
2023-05-14 13:28:15 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17909365 (40 peers), best: #2732144 (0x618b…c94a), finalized #2732032 (0x0071…6b63), ⬇ 230.8kiB/s ⬆ 202.6kiB/s
2023-05-14 13:28:18 [Parachain] ⚙️  Syncing  0.0 bps, target=#3885949 (40 peers), best: #689249 (0x1ad0…a551), finalized #0 (0xd435…be8d), ⬇ 34.1kiB/s ⬆ 8.1kiB/s
2023-05-14 13:28:20 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17909366 (40 peers), best: #2732144 (0x618b…c94a), finalized #2732032 (0x0071…6b63), ⬇ 285.0kiB/s ⬆ 232.5kiB/s
2023-05-14 13:28:23 [Parachain] ⚙️  Syncing  0.0 bps, target=#3885949 (40 peers), best: #689249 (0x1ad0…a551), finalized #0 (0xd435…be8d), ⬇ 8.2kiB/s ⬆ 6.7kiB/s
2023-05-14 13:28:25 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17909367 (40 peers), best: #2732144 (0x618b…c94a), finalized #2732032 (0x0071…6b63), ⬇ 269.0kiB/s ⬆ 225.0kiB/s
2023-05-14 13:28:28 [Parachain] ⚙️  Syncing  0.0 bps, target=#3885949 (40 peers), best: #689249 (0x1ad0…a551), finalized #0 (0xd435…be8d), ⬇ 42.2kiB/s ⬆ 10.2kiB/s
2023-05-14 13:28:30 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17909368 (40 peers), best: #2732144 (0x618b…c94a), finalized #2732032 (0x0071…6b63), ⬇ 252.5kiB/s ⬆ 211.8kiB/s

I'll try record nperf tomorrow BTW, I see that node can't stop gracefully (so does previous)

docker stop khala-node
Error response from daemon: cannot stop container: khala-node: tried to kill container, but did not receive an exit event

@jasl can you run the node with -lsync=trace when both the relay and parachain are getting stuck at 0.0 bps?

edit: if you decrease the number of blocks per request to 8 or 16 with --max-blocks-per-request 16 is it able to make progress then?

Sorry, I'll reply to you tomorrow, don't know why the node is stuck on boot for hours, I need re-sync

jasl commented 1 year ago

@altonen Here is the log khala-node-1.log.zip I see too many [Relaychain] Too many blocks in the queue.

--max-blocks-per-request 8 no help for the situation, log khala-node-2.log.zip

I can confirm the node binary is based on polkadot v0.9.42, here is the code https://github.com/Phala-Network/khala-parachain/tree/polkadot-v0.9.42

altonen commented 1 year ago

I don't think the issue is in networking. It looks like syncing is downloading blocks just fine and it stops downloading them after it has reached the import queue limit so I think the issue is somewhere else. From the second log it looks like that at 17:24:03.534 it starts importing the relaychain block 0x0fbd…5a47 and for the rest of the logs (~3 minutes) there is no activity reported from the import queue so it looks like it got stuck somewhere.

Could someone more knowledgeable about the import process take a look at this cc @skunert @bkchr @koute?

skunert commented 1 year ago

It is indeed strange, in the first log we see that blocks are scheduled for import but then not actually imported. I will continue to debug this. However I will not be available for a few days so it will take some time.

jasl commented 1 year ago

I will continue to debug this. However I will not be available for a few days so it will take some time.

No problem, thank you for your help, just let me know if you need me to provide information

jasl commented 1 year ago

@skunert if you can't reproduce the issue, I can give you access to my Azure VM

Slow sync is a common issue for us, but not all people can't sync nodes at all (they are just slow or very slow)

skunert commented 1 year ago

I was looking into this more today, sadly its not reproducing on a cloud machine I have and also not on my local machine. From the last logs you posted I have an idea.

@jasl Can you maybe try to reproduce the issue using this branch (and -lsync=trace) and post the logs? https://github.com/skunert/khala-parachain/tree/skunert/more-logging I have added some more aggressive logging which could help verify my suspicion that this might be an import-lock issue again.

And the machine specs would be interesting to know.

jasl commented 1 year ago

I was looking into this more today, sadly its not reproducing on a cloud machine I have and also not on my local machine. From the last logs you posted I have an idea.

@jasl Can you maybe try to reproduce the issue using this branch (and -lsync=trace) and post the logs? https://github.com/skunert/khala-parachain/tree/skunert/more-logging

I have added some more aggressive logging which could help verify my suspicion that this might be an import-lock issue again.

And the machine specs would be interesting to know.

Sorry I missed the message, I'll try this later

jasl commented 1 year ago

@skunert My Azure VM is Standard DC4s v2 (XeonE 2286, 4 Core CPU, 16 GB Mem), node data stored in a 3T Standard SSD LRS (IOPS 500, Max IO throughput 60MBps), and the node running in Docker container with host network

jasl commented 1 year ago

@skunert Here is one of my colleague‘s report the node running in a KVM virtual machine (the host is a 12900k PC), WD SN640 SSD

node.log.gz

jasl commented 1 year ago

I have a guess, just a guess, that the slow sync problem (and slow boot problem), in my interview, all occurred in virtualized machines (HyperV, TrueNAS, and KVM), is it possible that workloads of Substrate (syncing and DB IO) are not running well in VM environment?

When you try to reproduce the issue, do you running the node in bare metal or a VM?

jasl commented 1 year ago

khala-node.log.zip

Here is my Azure VM's log

the docker command is

docker run -dti --name khala-node \
    -e NODE_NAME=khala-node \
    -e NODE_ROLE=MINER \
    -e PARACHAIN_DB="paritydb" \
    -e RELAYCHAIN_DB="paritydb" \
    -e PARACHAIN_EXTRA_ARGS='--ws-max-connections 512 --rpc-max-response-size 64 --runtime-cache-size 16 --no-telemetry -lsync=trace' \
    -e RELAYCHAIN_EXTRA_ARGS='--ws-max-connections 512 --rpc-max-response-size 64 --runtime-cache-size 16 --no-telemetry -lsync=trace' \
    --network host \
    --restart always \
    --log-driver local --log-opt max-size=300m --log-opt max-file=3 \
    -v /data/khala-node:/root/data \
    jasl123/khala-node-with-launcher:v0.1.25-more-logging

jasl123/khala-node-with-launcher:v0.1.25-more-logging my build of your branch

jasl commented 1 year ago

sadly its not reproducing on a cloud machine I have and also not on my local machine.

Sorry to disturb you again @skunert , do you fully sync the node? even if your node is not fully stuck, when the progress reaches ~50% (in one of my cases the relaychain reaches #7931361), you should see the speed significant decrease, from starting 1000bps down to 50 - 100 bps

skunert commented 1 year ago

I have made multiple attempts to reproduce this over the last week. I was able to sync multiple nodes to the tip of the chain. I was using a cloud instance from google which is also virtualized afaik. I will try again with a cloud instance that is closer to the specs you have posted.

I will also look through the logs you have provided in your last comments.

sadly its not reproducing on a cloud machine I have and also not on my local machine.

Sorry to disturb you again @skunert , do you fully sync the node? even if your node is not fully stuck, when the progress reaches ~50% (in one of my cases the relaychain reaches #7931361), you should see the speed significant decrease, from starting 1000bps down to 50 - 100 bps

This does not sound suspicious to me, the speed of import depends on how much was going on in that specific blocks. Empty block import is much faster for example. So the real question is if this is slower than some old version.

The only thing I have seen in your logs and I have seen on my node is a log like this:

2023-05-25 12:27:12.932 TRACE tokio-runtime-worker sync::import-queue: [Parachain] Header 0x99ab…73fe has 2 logs
...
2023-05-25 12:28:19.304 TRACE tokio-runtime-worker sync::import-queue: [Parachain] Block imported successfully Some(771254) (0x99ab…73fe)

This looks strange to me because this should take maybe 200ms but takes about a minute here. However this only happened once in all my reproduction attempts and nothing caught the eye in that specific codepath.

jasl commented 1 year ago

However this only happened once in all my reproduction attempts

When you see the situation, do the coming blocks are keeping slow like this? in my collected reports, if a node goes into the situation, logs (nearly) always show 0.0bps

My ops colleague gave me some input, he always suspect the IO may have issues

[May 15, 2023 at 04:37:33]:
with a whopping 5000 read OPS

and 30-1000 (peak) write op ... but write sticks to 6-33 useally

i do not understand the heavy reading

in KiB's its not much only 50 KB/s, but the random I/O is terrifing

which explains why SSD's also struggle
[May 25, 2023 at 03:50:58]:
on a node with khala chain and parityDB

run lsof -n | grep ksmcc3 | wc -l

it shows the open files

in my case : 43155, but there are 87107 total in the dir

i wonder if some max_open_files limits opening all the DB files or something
jasl commented 1 year ago

I may find something...

Based on my colleague's input, I checked ulimit -a of the Docker container.

real-time non-blocking time  (microseconds, -R) unlimited
core file size              (blocks, -c) unlimited
data seg size               (kbytes, -d) unlimited
scheduling priority                 (-e) 0
file size                   (blocks, -f) unlimited
pending signals                     (-i) 63889
max locked memory           (kbytes, -l) 65536
max memory size             (kbytes, -m) unlimited
open files                          (-n) 1048576
pipe size                (512 bytes, -p) 8
POSIX message queues         (bytes, -q) 819200
real-time priority                  (-r) 0
stack size                  (kbytes, -s) 8192
cpu time                   (seconds, -t) unlimited
max user processes                  (-u) unlimited
virtual memory              (kbytes, -v) unlimited
file locks                          (-x) unlimited

I try to increase max locked memory, I can see the stuck node (on my Azure) moving again.

my full command is

docker run -dti --name khala-node \
    -e NODE_NAME=khala-node \
    -e NODE_ROLE=MINER \
    -e PARACHAIN_DB="paritydb" \
    -e RELAYCHAIN_DB="paritydb" \
    -e PARACHAIN_EXTRA_ARGS='--ws-max-connections 512 --rpc-max-response-size 64 --runtime-cache-size 16 --no-telemetry' \
    -e RELAYCHAIN_EXTRA_ARGS='--ws-max-connections 512 --rpc-max-response-size 64 --runtime-cache-size 16 --no-telemetry' \
    --network host \
    --restart always \
    --log-driver local --log-opt max-size=300m --log-opt max-file=3 \
    --ulimit memlock=8192000000:8192000000 \
    -v /data/khala-node:/root/data \
    jasl123/khala-node-with-launcher

UPDATE: I checked a few of my managed VMs, and I can see the default ulimit is totally different between those machines. I guess this is why we can't stably reproduce the situation even with the same Ubuntu 22.04