paritytech / polkadot-sdk

The Parity Polkadot Blockchain SDK
https://polkadot.network/
1.78k stars 635 forks source link

Upgrading bootnode client code from polkadot-v0.9.38 to polkadot-v1.1.0 causes bans #5265

Open maltekliemann opened 1 month ago

maltekliemann commented 1 month ago

We have two bootnodes running at polkadot-v0.9.38. Upgrading one of these to our new client, which is at polkadot-v1.1.0 raises the following two error messages:

[🔮 Zeitgeist Parachain] Report 12D3KooWAXGvE8rMyNqpeUsmf54sQ6auje7VvRGfYxgMQJc6bZaA: -2147483648 to -2147483648. Reason: Same block request multiple times. Banned, disconnecting.

[Relaychain] Report 12D3KooWGd8FPMDvLgE4CrEZg31FPJLmVwihC2PVZ9gfTuY9tosX: +100 to -2147483548. Reason: Grandpa: Neighbor message. Banned, disconnecting.

(The runtime is still at the old version at this point.)

We have failed to reproduce these error messages using a non-bootnode client on our local machines, also in integration tests or using a local parachain network - no errors!

What causes these errors? Are they bootnode related? Can we expect them to vanish once a majority of the nodes are updated to the new client? Should we be concerned about going through with this update?

skunert commented 1 month ago

The first message is being tracked here: https://github.com/paritytech/polkadot-sdk/issues/1915 and here: https://github.com/paritytech/polkadot-sdk/issues/531

There has been recent work to avoid these duplicate block requests: https://github.com/paritytech/polkadot-sdk/pull/5029 However, that fix is fairly recent and will not come to 1.1.0. The issue should not appear too often however.

Regarding the grandpa neighbor message I am not sure. cc @lexnv

maltekliemann commented 1 month ago

Thanks for the reply. Am I interpreting this correctly: We don't have to worry about halting our network due to at least the first message?

lexnv commented 1 month ago

We don't have to worry about halting our network due to at least the first message?

I would say everything is ok, nothing to worry about from those 2 log messages. Generally, we care more about about block production and connectivity to multiple peers. It is ok for some peers to get disconnected, especially if they "misbehave".

We currently ban peers that have made the same request to us multiple time (3 times). This is causing the first error message Same block request.

This PR https://github.com/paritytech/polkadot-sdk/pull/5029 aims to offload some pressure from peers who are slow to respond. However, we'd have to wait a bit for it to get deployed to the majority of the network before we see some significant effects.

The second message is not concerning. We are adding +100 to the reputation of a banned peer. This is expected behavior and we are "rewarding" the peer by increasing its reputation.

We are emitting warnings if the reputation of a peer gets under a threshold. In this case, the +100 did not increase the peer's reputation enough to escape the banned threshold.

Next Steps