paritytech / polkadot-sdk

The Parity Polkadot Blockchain SDK
https://polkadot.network/
1.81k stars 660 forks source link

Banned node for "Same state requested multiple times" #4924

Open girazoki opened 3 months ago

girazoki commented 3 months ago

We have seen the "Banned: same state requested multiple times" using warp sync without apparent signs of the node doing anything wrong. The architecture is at follows:

Full node logs:

INFO tokio-runtime-worker substrate: [Parachain] 💤 Idle (3 peers), best: #822149 (0x8fa1…d27f), finalized #822147 (0x6e40…af94), ⬇ 2.7kiB/s ⬆ 4.4MiB/s
INFO tokio-runtime-worker substrate: [Parachain] 💤 Idle (3 peers), best: #822149 (0x8fa1…d27f), finalized #822147 (0x6e40…af94), ⬇ 0.5kiB/s ⬆ 2.7MiB/s
INFO tokio-runtime-worker substrate: [Parachain] 💤 Idle (3 peers), best: #822149 (0x8fa1…d27f), finalized #822147 (0x6e40…af94), ⬇ 0.9kiB/s ⬆ 4.8MiB/s
 Idle (3 peers), best: #822149 (0x8fa1…d27f), finalized #822147 (0x6e40…af94), ⬇ 2.0kiB/s ⬆ 10.1MiB/s
 INFO tokio-runtime-worker substrate: [Parachain] 💤 Idle (0 peers), best: #822149 (0x8fa1…d27f), finalized #822147 (0x6e40…af94), ⬇ 1.1kiB/s ⬆ 6.3MiB/s
 WARN tokio-runtime-worker peerset: [Parachain] Report 12D3KooWA4gQgEZgPka1UW5KWZYcjYMJUrpHgDWaJFJUaUUFknkx: -2147483648 to -2147483648. Reason: Same state request multiple times. Banned, disconnecting.
 WARN tokio-runtime-worker peerset: [Parachain] Report 12D3KooWKc1dwqMibdf5DYXM3B3vDxwpFXr6SKcvtJ8BSCZoehiZ: -2147483648 to -2147483648. Reason: Same state request multiple times. Banned, disconnecting.

We are wondering if the issue could be caused because the full-node cannot serve 3 nodes simultaneously asking for the latest state, which causes these 3 nodes to not receive the indicated bytes and therefore re-ask again for the same byttes, which causes the banning. Could this be any possible explanation?

From what we have seen from the substrate code, the banning happens after receiving the same request 3 times

bkchr commented 3 months ago

CC @paritytech/networking

girazoki commented 3 months ago

Possibly related: https://github.com/paritytech/polkadot-sdk/issues/531#issuecomment-1691856724

lexnv commented 3 months ago

Possible root cause for multiple block requests, needs debug patch to test with a local net:

https://github.com/paritytech/polkadot-sdk/blob/d250dcb7634579f5d755b655bc185845ee46f9ac/substrate/client/network/sync/src/strategy/chain_sync.rs#L475

https://github.com/paritytech/polkadot-sdk/blob/d250dcb7634579f5d755b655bc185845ee46f9ac/substrate/client/network/sync/src/strategy/chain_sync.rs#L1138-L1144

On one side, we store the peers in a HashMap (strategy/chain_sync logic):

https://github.com/paritytech/polkadot-sdk/blob/d250dcb7634579f5d755b655bc185845ee46f9ac/substrate/client/network/sync/src/strategy/chain_sync.rs#L252

On the other side, we store in a LRU map the previous requests:

https://github.com/paritytech/polkadot-sdk/blob/d250dcb7634579f5d755b655bc185845ee46f9ac/substrate/client/network/sync/src/block_request_handler.rs#L155

lexnv commented 3 months ago

Managed to reproduce this behavior with the following branch:

2024-07-05 19:15:40 Finished sleeping for block request from 12D3KooWPfpK9xaCkg4eukAT9nJunVjc1drNisHFcE6qpdQWTHiu!
2024-07-05 19:15:40 Received block request from 12D3KooWPfpK9xaCkg4eukAT9nJunVjc1drNisHFcE6qpdQWTHiu simulating sleep!
2024-07-05 19:15:40 Peer 12D3KooWPfpK9xaCkg4eukAT9nJunVjc1drNisHFcE6qpdQWTHiu disconnected. Removed from internal mappings
2024-07-05 19:15:40 New peer 12D3KooWPfpK9xaCkg4eukAT9nJunVjc1drNisHFcE6qpdQWTHiu with best hash 0x1c69…fbf9 (0)
2024-07-05 19:15:40 Inserting peer 12D3KooWPfpK9xaCkg4eukAT9nJunVjc1drNisHFcE6qpdQWTHiu with state PeerSync { peer_id: PeerId("12D3KooWPfpK9xaCkg4eukAT9nJunVjc1drNisHFcE6qpdQWTHiu"), common_number: 0, best_hash: 0x1c69d379ff8920602d3ccb392e6f8fdd319b47971775f7696b17a69ca06cfbf9, best_number: 0, state: Available } (InChainPruned)

2024-07-05 19:16:00 Finished sleeping for block request from 12D3KooWPfpK9xaCkg4eukAT9nJunVjc1drNisHFcE6qpdQWTHiu!

2024-07-05 19:16:00 Received block request from 12D3KooWPfpK9xaCkg4eukAT9nJunVjc1drNisHFcE6qpdQWTHiu simulating sleep!
2024-07-05 19:16:00 Report 12D3KooWPfpK9xaCkg4eukAT9nJunVjc1drNisHFcE6qpdQWTHiu: -2147483648 to -2147483648. Reason: Same block request multiple times. Banned, disconnecting.
2024-07-05 19:16:00 Peer 12D3KooWPfpK9xaCkg4eukAT9nJunVjc1drNisHFcE6qpdQWTHiu disconnected. Removed from internal mappings

Will come up with a PR to fix this next week

girazoki commented 2 months ago

Thank you @lexnv ! appreciated!