paritytech / polkadot-sdk

The Parity Polkadot Blockchain SDK
https://polkadot.network/
1.78k stars 642 forks source link

Block announcements are dropped because all per-peer validation slots are occupied #1929

Open dmitry-markin opened 11 months ago

dmitry-markin commented 11 months ago

As reported by @altonen while running a burn-in for https://github.com/paritytech/polkadot-sdk/pull/1631, some block announcements are dropped with a warning: https://github.com/paritytech/polkadot-sdk/blob/1cf7d3aafa0e7edaa20d34581e0f1656895cb9c2/substrate/client/network/sync/src/block_announce_validator.rs#L163

This can be happening for several reasons:

  1. Validations are not correctly removed from the queue after they complete.
  2. Validations are not polled enough.
  3. Peers are announcing too many validations for them to be processed concurrently (MAX_CONCURRENT_BLOCK_ANNOUNCE_VALIDATIONS_PER_PEER are currently set to 4).
dmitry-markin commented 11 months ago

It doesn't seem that this is happening due to 1. or 2. Validation slots are always deallocated once validations complete here: https://github.com/paritytech/polkadot-sdk/blob/21b32849db4ac5f284d7ca2f93927b0d35bb783a/substrate/client/network/sync/src/block_announce_validator.rs#L297 and per-peer slot deallocation is tested here: https://github.com/paritytech/polkadot-sdk/blob/21b32849db4ac5f284d7ca2f93927b0d35bb783a/substrate/client/network/sync/src/block_announce_validator.rs#L359-L383

Also, assumption 2. is unlikely, because validations are always polled until the stream is pending (the stream never terminates): https://github.com/paritytech/polkadot-sdk/blob/21b32849db4ac5f284d7ca2f93927b0d35bb783a/substrate/client/network/sync/src/engine.rs#L910-L912.

dmitry-markin commented 10 months ago

It looks like the issue might be caused by major syncing peers announcing imported blocks. By default, we always announce imported blocks: https://github.com/paritytech/polkadot-sdk/blob/8ce16ee6ff190912995dc6c8da85b741eaada395/substrate/client/service/src/lib.rs#L184-L186 announce_imported_blocks: https://github.com/paritytech/polkadot-sdk/blob/8ce16ee6ff190912995dc6c8da85b741eaada395/substrate/client/cli/src/config.rs#L431-L436

There is no "fuse" in SyncingEngine making block announcement a no-op during major sync either: https://github.com/paritytech/polkadot-sdk/blob/8ce16ee6ff190912995dc6c8da85b741eaada395/substrate/client/network/sync/src/engine.rs#L634-L673

When a peer is major syncing, the announcements might be broadcast faster then they are processed, what triggers the warning.

bkchr commented 10 months ago

When we are major syncing the node internally isn't sending block import notification and thus, we also don't announce anything.

dmitry-markin commented 10 months ago

https://github.com/paritytech/polkadot-sdk/pull/2132 helped with polling block announce validations during versi scaling, but if the issue described in https://github.com/paritytech/polkadot-sdk/issues/2152 continues, we'll need to look into it again.

senseless commented 4 months ago

Adding logs for an event that might be related. I brought up a RPC node for Frequency and received a large number of block imports. Nothing special was happening on the nodes at the time. No other details / errors / warnings were indicated. @wilwade believed this issue may be related.

frequency-logs.txt

wilwade commented 4 months ago

Thanks @senseless Here are the logs for one of the archive nodes we run that picked up on the error Ignored-Block-Logs.txt

Frequency is currently using Polkadot SDK v1.1.0 So possible that this has been fixed and we just need to update, but wanted to make sure the logs were here in case.

Also getting 1000 of them in < 1 second feels odd.

bkchr commented 4 months ago

We should probably stop logging this, as this could be literally someone spamming the node.

dmitry-markin commented 4 months ago

We should probably stop logging this, as this could be literally someone spamming the node.

Addressed in https://github.com/paritytech/polkadot-sdk/pull/4480.