Cumulus sync performance

arkpar commented 3 years ago

When the cumulus node start syncing from genesis or significantly behind the tip of the network, it syncs two chains in parallel. Best block and finality of the parachain depends on the state of the relay chain, so behaviour during the sync depends on which of the two chains syncs faster.

When the parachain syncs and blocks are imported with NetworkInitialSync origin, the head is moved to the latest imported block regardless of the relay chain state. But then, there's also some logic to set the head based on the relay chain block once it is imported. So if the relay chain is importing behind the parachain, the head is moved backwards and forwards by thousands of blocks, leading to huge DB updates.
When the relay chain is falling behind the parachain, finality of the parachain is also falling behind. Long unfinalized chains lead to more performance issues in the database. On each import the tree route to the last finalized block is queried from the db. When there are 100000 such blocks, it slows down import dramatically.

jasl commented 1 year ago

I think we might meet this bug, TD; LR, we try to sync a full new node, when syncing progress reaches about half, the speed (bps) drops to 0.x bps, but we never see this when we restore the relay chain from a snapshot

altonen commented 1 year ago

Are you experiencing this only recently? We did some refactoring for the syncing implementation and those changes don't play too well with other open issues with sc-network which could possibly contribute to lower performance. I will be looking at this issue soon.

jasl commented 1 year ago

Are you experiencing this only recently? We did some refactoring for the syncing implementation and those changes don't play too well with other open issues with sc-network which could possibly contribute to lower performance. I will be looking at this issue soon.

actually, we often got the reports, but we can't reproduce them... so we just assume this is a hardware or network issue

Normally we restore Relaychain DB from a snapshot

Until recently, my colleague and I both sync new nodes, all from 0, and we meet the issue.

My colleague rough debug:

but when nearly 50% (i think 45 or some) syncing just dead halts

15442 read ops, 58 write op

so its wrting ~30MB/s but kill the io buffer

if i randomly write from the system, i can get 18GB/s easily

so its writing 30MB/s, thats 105GB/hours

after 1 hour only 41GB was written

so 60% of the writes were wasted

also using ANY kind of raid amplifies the issue

e.g. raid 5/6

but raid 0, 1, 10 is fine .... with raid 0 (striping) the issues are less worrysome

As for me, I don't see visible issue on my local server, but my Azure VM has the same problem, I guess my local server has Dell flagship RAID card that helps me mitigate the problem, but unluck for the Azure VM, the sync speed from ~500bps to ~0bps

jasl commented 1 year ago

I tried to sync Polkadot first, then Phala on that Azure VM (4 cores Xeon 2288G, 16G, and 3T premium SSD),

the sync was fast, I can see Phala sync > 800 bps, but a few hours later

it slow now

2023-04-18 08:49:22 [Parachain] ⚙️  Syncing  0.4 bps, target=#3720177 (40 peers), best: #886790 (0x727f…2c41), finalized #0 (0xd435…be8d), ⬇ 33.4kiB/s ⬆ 6.0kiB/s
2023-04-18 08:49:22 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17527415 (40 peers), best: #17296990 (0x85f1…9b9e), finalized #17296896 (0xf29b…4d53), ⬇ 362.6kiB/s ⬆ 219.1kiB/s

altonen commented 1 year ago

We have gotten a lot of reports like this and have likely identified the issue. It should be fixed in the next release.

jasl commented 1 year ago

We have gotten a lot of reports like this and have likely identified the issue. It should be fixed in the next release.

Does the next release mean 0.9.42? Could you share the related PR?

altonen commented 1 year ago

Yes, https://github.com/paritytech/substrate/pull/13829

jasl commented 1 year ago

Yes, paritytech/substrate#13829

I tried patching that PR to our node, but it seems no help for syncing speed, maybe there are other issues?

2023-04-25 21:08:43 [Parachain] ⚙️  Syncing  0.4 bps, target=#3770937 (40 peers), best: #819461 (0x3a7f…97a2), finalized #0 (0xd435…be8d), ⬇ 11.8kiB/s ⬆ 5.7kiB/s
2023-04-25 21:08:47 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17641352 (40 peers), best: #3339776 (0x0994…abc9), finalized #3339776 (0x0994…abc9), ⬇ 387.3kiB/s ⬆ 361.9kiB/s
2023-04-25 21:08:48 [Parachain] ⚙️  Syncing  0.0 bps, target=#3770937 (40 peers), best: #819461 (0x3a7f…97a2), finalized #0 (0xd435…be8d), ⬇ 36.4kiB/s ⬆ 5.1kiB/s
2023-04-25 21:08:52 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17641353 (40 peers), best: #3339776 (0x0994…abc9), finalized #3339776 (0x0994…abc9), ⬇ 394.3kiB/s ⬆ 355.7kiB/s
2023-04-25 21:08:53 [Parachain] ⚙️  Syncing  0.8 bps, target=#3770937 (40 peers), best: #819465 (0xd333…9efb), finalized #0 (0xd435…be8d), ⬇ 32.2kiB/s ⬆ 8.5kiB/s

altonen commented 1 year ago

Can you run the node with -lsync=trace

jasl commented 1 year ago

Can you run the node with -lsync=trace

khala-node.log.zip

Uploaded

It seems too many

2023-04-26 00:29:53.162 DEBUG tokio-runtime-worker sync: [Relaychain] Too many full nodes, rejecting 12D3KooWLqUHk2Xj1rW6A4CGDnoP3mFBKnDrpzh7SjJBZ68UsS8D
2023-04-26 00:29:53.162 DEBUG tokio-runtime-worker sync: [Relaychain] Failed to register peer PeerId("12D3KooWLqUHk2Xj1rW6A4CGDnoP3mFBKnDrpzh7SjJBZ68UsS8D"): BlockAnnouncesHandshake { roles: FULL, best_number: 17643353, best_hash: 0xa8fcec5c8ef0679cb52e3b9ac22ee49533023b8524a7a237869c5e517cf91535, genesis_hash: 0xb0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe }
2023-04-26 00:29:53.162 TRACE tokio-runtime-worker sync: [Relaychain] Disconnected peer which had earlier been refused by on_sync_peer_connected 12D3KooWGFneh4LCQi4gVUSPcwzWLcdz3a9gBFcMh2xeicB3PAt9
2023-04-26 00:29:53.168 TRACE tokio-runtime-worker sync: [Parachain] 12D3KooWDwrsuC2ZCbJt5BPaFtLpCcNiWpa8y1aGNxJgfAcLboR9 Ignoring transactions while major syncing
2023-04-26 00:29:53.168 TRACE tokio-runtime-worker sync: [Parachain] 12D3KooWRRMxmEoUJmqqPJr7PbDy1XyTzSQDm8JK3SVhUwVqBQkh Ignoring transactions while major syncing
2023-04-26 00:29:53.170 TRACE tokio-runtime-worker sync: [Relaychain] New peer 12D3KooWABrnESXvjBok2hfPv7SoPrRFpfbfzWftePT76QMV22Sh BlockAnnouncesHandshake { roles: AUTHORITY, best_number: 17643353, best_hash: 0xa8fcec5c8ef0679cb52e3b9ac22ee49533023b8524a7a237869c5e517cf91535, genesis_hash: 0xb0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe }
2023-04-26 00:29:53.170 DEBUG tokio-runtime-worker sync: [Relaychain] Too many full nodes, rejecting 12D3KooWABrnESXvjBok2hfPv7SoPrRFpfbfzWftePT76QMV22Sh
2023-04-26 00:29:53.170 DEBUG tokio-runtime-worker sync: [Relaychain] Failed to register peer PeerId("12D3KooWABrnESXvjBok2hfPv7SoPrRFpfbfzWftePT76QMV22Sh"): BlockAnnouncesHandshake { roles: AUTHORITY, best_number: 17643353, best_hash: 0xa8fcec5c8ef0679cb52e3b9ac22ee49533023b8524a7a237869c5e517cf91535, genesis_hash: 0xb0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe }

altonen commented 1 year ago

It looks like there is something wrong with block import. It says "too many blocks in the queue" which prevents ChainSync from downloading new blocks. Interestingly both the relaychain and parachain import blocks so it looks like ChainSync::on_blocks_processed() is not getting called which prevents it from making progress.

If I give you a patch with some more logging, are you able to apply that to your node and run it again with trace logging?

jasl commented 1 year ago

It looks like there is something wrong with block import. It says "too many blocks in the queue" which prevents ChainSync from downloading new blocks. Interestingly both the relaychain and parachain import blocks so it looks like ChainSync::on_blocks_processed() is not getting called which prevents it from making progress.

If I give you a patch with some more logging, are you able to apply that to your node and run it again with trace logging?

No problem!

BTW, our code based on Polkadot v0.9.41, and I cherry-picked https://github.com/paritytech/substrate/pull/13829 and https://github.com/paritytech/substrate/pull/13941

altonen commented 1 year ago

Here's the patch: logging.txt, cheers! It should reveal if the communication between the import queue and syncing is lagging. I'll see if I can reproduce this locally as well.

But there seems to be something wrong with the code that decides if a block request is sent to peer:

2023-04-26 00:25:49.112 sync: [Relaychain] Finished block announce validation: from PeerId("12D3KooWJBv3FnJyGbqdkzRD3eAkckNkhE2wo4Up3D7tQj6dwdkF"): AnnouncementSummary { block_hash: 0xfe252d7f72be5f9c86bb4b18ac64edd12d486dd10e97260fe2477ea58ece8dd3, number: 17643313, parent_hash: 0x91afc0c125fc26ac549081a204db733bceab525698268ce74612095b318cbe56, state: Some(Normal) }. local_best=false    
...
2023-04-26 00:25:49.283 sync: [Relaychain] Too far ahead for peer 12D3KooWJBv3FnJyGbqdkzRD3eAkckNkhE2wo4Up3D7tQj6dwdkF (3491983)

If locally I think the peer's best block is 17643313, then I should be able to download block 3491983 from them.

@arkpar Does this look wrong to you or am I misunderstanding something about the block request scheduler?

arkpar commented 1 year ago

It looks like there is something wrong with block import. It says "too many blocks in the queue" which prevents ChainSync from downloading new blocks. Interestingly both the relaychain and parachain import blocks so it looks like ChainSync::on_blocks_processed() is not getting called which prevents it from making progress.

This probably means that the block import is slow for whatever reason. I'm not familiar with phala and how computationaly heavy their blocks supposed to be. Profiling would help. Or at least a stack dump taken with eu-stack or similar tool.

Btw, this has nothing to do with the original issue description.

altonen commented 1 year ago

I was looking at the logs again and it looks like the relay chain block import is getting stuck (or at least taking a long time).

It schedules 64 blocks for import and then imports a few of them

2023-04-26 00:34:42.191 sync::import-queue: [Relaychain] Scheduling 64 blocks for import

but when it has imported block 3499005 and starts importing the block with header 0x1eda..., it stops and for the rest of the logs (~20 seconds), import queue doesn't log anything for relay chain and it looks like it's stuck.

2023-04-26 00:34:42.562 sync::import-queue: [Relaychain] Block imported successfully Some(3499005) (0xc1b2…ede4)    
2023-04-26 00:34:42.562 sync::import-queue: [Relaychain] Header 0x1eda…31cb has 2 logs

jasl commented 1 year ago

Here's the patch: logging.txt, cheers! It should reveal if the communication between the import queue and syncing is lagging. I'll see if I can reproduce this locally as well.

But there seems to be something wrong with the code that decides if a block request is sent to peer:
2023-04-26 00:25:49.112 sync: [Relaychain] Finished block announce validation: from PeerId("12D3KooWJBv3FnJyGbqdkzRD3eAkckNkhE2wo4Up3D7tQj6dwdkF"): AnnouncementSummary { block_hash: 0xfe252d7f72be5f9c86bb4b18ac64edd12d486dd10e97260fe2477ea58ece8dd3, number: 17643313, parent_hash: 0x91afc0c125fc26ac549081a204db733bceab525698268ce74612095b318cbe56, state: Some(Normal) }. local_best=false    
...
2023-04-26 00:25:49.283 sync: [Relaychain] Too far ahead for peer 12D3KooWJBv3FnJyGbqdkzRD3eAkckNkhE2wo4Up3D7tQj6dwdkF (3491983)    
If locally I think the peer's best block is 17643313, then I should be able to download block 3491983 from them.

@arkpar Does this look wrong to you or am I misunderstanding something about the block request scheduler?

khala-node.log.zip

Here's the new log, please take a look

jasl commented 1 year ago

I can try another sync, the performance will regress when the progress is reach ~50%, even the node doesn't trigger the critical problem, it still will reduce to < 50bps

I'm suspecting --state-pruning archive-canonical casue the issue, our usage requires the node must not prune states

altonen commented 1 year ago

Could you try syncing again without the flag?

jasl commented 1 year ago

Sure, I'll try upload a new log from the node start, then remove --state-pruning archive-canonical and retry

Should I dump log after the sync speed start slowing?

jasl commented 1 year ago

khala-node.log

This is the log from start

altonen commented 1 year ago

Sure, I'll try upload a new log from the node start, then remove --state-pruning archive-canonical and retry

Should I dump log after the sync speed start slowing?

Looking at the logs you provided, the issue doesn't seem to be communication between syncing and import queue but something else so I'm not sure it makes a difference whether you capture them from the start or only after the sync has started slowing down.

jasl commented 1 year ago

Another new sync on Azure VM, paritydb with default state-pruning and block-pruning, after nearly a day sync, the speed reduces to

2023-04-27 11:51:49 [Parachain] ⚙️  Syncing 17.2 bps, target=#3780877 (40 peers), best: #2481802 (0xb67c…a1fb), finalized #0 (0xd435…be8d), ⬇ 36.3kiB/s ⬆ 11.2kiB/s
2023-04-27 11:51:50 [Relaychain] ⚙️  Syncing  0.0 bps, target=#17664467 (40 peers), best: #7540943 (0x8cf2…4e73), finalized #7540736 (0x2100…f186), ⬇ 280.0kiB/s ⬆ 277.3kiB/s

Slight better than archive-canonical but definitely not right

Any idea how to profile the node?

jasl commented 1 year ago

After 2 days, the node syncing speed reduce to ~0 bps again, so pruning mode is not related

Any idea that how to debug / profile this case?

altonen commented 1 year ago

Can you update this issue instead: https://github.com/paritytech/polkadot-sdk/issues/13

Somebody is probably able to help guide you with profiling

paritytech / substrate

Cumulus sync performance #9360