Closed bkchr closed 4 months ago
I am guessing this is a non-issue, 1000 validators gossiping, multiple messages getting enqueued without being processed during node restart. Maybe we should increase the limit for this warning on Kusama to 100k
instead of 10k
.
Besides seeing this shortly after restart, are you experiencing any other issues?
Also, is your node finalizing BEEFY blocks? do you see logs like INFO tokio-runtime-worker beefy: 🥩 Round #21946827 concluded
?
Or you can also check RPC against your node: RPC::beefy::getFinalizedHead()
.
From my side, beside seeing similar message after a restart, all seems to be good, it's finalizing BEEFY blocks:
polkadot[208721]: 2024-02-19 14:42:47 🥩 Round #21946707 concluded, finality_proof: V1(SignedCommitment { ...
Why? If its CPU usage, we do not check the BLS signature when gossiping, right?
Why? If its CPU usage, we do not check the BLS signature when gossiping, right?
happens only during node restart while gossiped messages are piling up during restart (BEEFY voter task starts late in the process while network gossip subsystem starts early).
Once node restart process is complete and BEEFY voter/worker task gets crunching, it seems it consumes the pending gossip messages and carries on nicely.
From my side, beside seeing similar message after a restart, all seems to be good, it's finalizing BEEFY blocks
For completeness of sanity checks, please also check node RAM usage over time. The bad scenario to invalidate here is that the gossip messages keep piling up faster than being consumed - in which case the RAM usage would reflect this very visibly over a, say, 24h window.
As Paulo is, seeing finalised BEEFY blocks
Feb 19 16:45:55 ovh-adv2-fireproof polkadot[437166]: 2024-02-19 16:45:55 🥩 Round #21947934 concluded, finality_proof: V1(SignedCommitment { commitment: Commitment { payload: ...
I had the same issue with the v1.7.1 and not only at the restart of the node
On both KSM nodes I updated, I received the error "The number of unprocessed messages in channel
mpsc_beefy_gossip_validatorexceeded 10000."
upon startup of the node after installing v1.7.1.
For completeness of sanity checks, please also check node RAM usage over time. The bad scenario to invalidate here is that the gossip messages keep piling up faster than being consumed - in which case the RAM usage would reflect this very visibly over a, say, 24h window.
Ram & cpu seem ok but network usage has doubled since the activation of beefy
happens only during node restart while gossiped messages are piling up during restart (BEEFY voter task starts late in the process while network gossip subsystem starts early).
GRANDPA was designed so you only need to pay attention to the current round. In principle BEEFY should continue this, so maybe this requires some look ahead that realizes older messsages could now be discarded. Alternatively BEEFY rounds could simply be run less frequently, so then even though this happens it winds up irrelevant.
network usage has doubled since the activation of beefy
Do we know if BEEFY caused this? I'd think network usage doubling sounds more like async backing, but not really sure.
network usage has doubled since the activation of beefy
Do we know if BEEFY caused this? I'd think network usage doubling sounds more like async backing, but not really sure.
You are right, a lot of things happening recently (runtime upgrade, new version, beefy), so it could also be the async backing kicking in.
Fixed by #3435 and will be released with node version 1.8.0
.
Validators on Kusama report the following: