Open adamdossa opened 2 years ago
We have seen that when the bad operator produces a block and broadcasts the peers. The peers begin validating the block and send a request to the operator for the block details, but the operator rejects the request Reason: Request refused
.
Here are some log messages from another node (with -lsync
flag). This is for a different block that the one from Adam's logs.
The node receives the block from the operator (19:14:30 is the correct time for this block):
2022-04-28 19:14:30.138 DEBUG tokio-runtime-worker sync: Pre-validating received block announcement 0xa1ddd181d516444f20dd86d5e45a2eb749d65fa7da57fb0f5de64e2657039073 with number 2593343 from 12D3KooWBTD1MjUQjb5wbuXR5zacGaWksrf4K4FK8SwbqT9Edmdx
2022-04-28 19:14:30.139 TRACE tokio-runtime-worker sync: Finished block announce validation: Process { is_new_best: true, who: PeerId("12D3KooWBTD1MjUQjb5wbuXR5zacGaWksrf4K4FK8SwbqT9Edmdx"), announce: BlockAnnounce { header: Header { parent_hash: 0x77db0f424571ec5fdf5cc8ae554b3a8d9b15bbf89e0021bb735d2f86e369e9e7, number: 2593343, state_root: 0x14647ad2105a6303ef37e56a9bbee788c362da5d4865633072f58a8187564ec7, extrinsics_root: 0xee7cbe3434e596bd578ed0db1d4d33e69d43dc15901f70562b045fc00fbb0006, digest: Digest { logs: [DigestItem::PreRuntime([66, 65, 66, 69], [2, 5, 0, 0, 0, 217, 19, 103, 16, 0, 0, 0, 0]), DigestItem::Seal([66, 65, 66, 69], [210, 36, 72, 88, 37, 180, 67, 217, 25, 64, 24, 148, 142, 103, 60, 9, 133, 134, 91, 247, 164, 36, 140, 150, 235, 4, 147, 130, 2, 85, 136, 46, 27, 96, 207, 84, 244, 38, 144, 167, 180, 87, 174, 162, 202, 66, 175, 206, 75, 106, 66, 247, 180, 161, 157, 102, 219, 27, 242, 129, 27, 149, 181, 137])] } }, state: Some(Best), data: Some([]) } }
2022-04-28 19:14:30.144 TRACE tokio-runtime-worker sync: Announce validation result is nothing
But when it sends a New block request
to the operator, the request is refused and the connection is dropped:
2022-04-28 19:14:30.144 TRACE tokio-runtime-worker sync: New block request for 12D3KooWBTD1MjUQjb5wbuXR5zacGaWksrf4K4FK8SwbqT9Edmdx, (best:2593343, common:2593342) BlockRequest { id: 0, fields: HEADER | BODY | JUSTIFICATION, from: Hash(0xa1ddd181d516444f20dd86d5e45a2eb749d65fa7da57fb0f5de64e2657039073), to: None, direction: Descending, max: Some(1) }
2022-04-28 19:14:30.529 TRACE tokio-runtime-worker sync: Block request to peer PeerId("12D3KooWBTD1MjUQjb5wbuXR5zacGaWksrf4K4FK8SwbqT9Edmdx") failed: Refused.
2022-04-28 19:14:30.529 TRACE tokio-runtime-worker sync: 12D3KooWBTD1MjUQjb5wbuXR5zacGaWksrf4K4FK8SwbqT9Edmdx disconnected
Is there an existing issue?
Experiencing problems? Have you tried our Stack Exchange first?
Description of bug
Hi Substrate Team,
We've been trying to debug the issue below for the last few days, but are hitting a wall. Any advice or tips you can provide would be very much appreciated.
There is a stack exchange issue which offers a high-level description below, but adding more detail here: https://substrate.stackexchange.com/questions/2166/validator-node-dropping-peers-after-producing-a-block-followed-by-a-reorg
For the last few days, around 1/3 of our validator nodes are scoring quite poorly in terms of reward points, missing many, but not all, or their block production slots (we use BABE + GRANDPA): https://mainnet-app.polymesh.network/#/staking
Looking at an individual poorly performing validator, we observe that they seem to produce a block, but that the block fails to propagate to the rest of the network, despite a reasonable initial peer count. Alongside this failure, peer counts tend to drop materially just after the block is produced and gossiped:
We turned on
-l peerset=trace
on one of the impacted nodes, and I am attaching a log file from that node. As an example, if you search forPre-sealed block for proposal at 2603940
you can see an example of a block being produced, failing to propagate, and peer count dropping (not sure if this is a cause or effect).Log file: https://drive.google.com/file/d/1YBOQccA7qmJWDMdi5oUu3lKh3W01KzDY/view?usp=sharing
Some puzzles:
Thanks, Adam
Steps to reproduce
This doesn't seem to be reproducible in isolation unfortunately.