Open ordian opened 3 years ago
OurViewChange
relates to the network and is meant to tell networking subsystems what our peers know about us. It's not meant to inform the work that the subsystems themselves do, that's what ActiveLeavesUpdate
is for.
We can improve our handling of the discrepancies between these notifications but they are actually designed to have discrepancies.
We can improve our handling of the discrepancies between these notifications but they are actually designed to have discrepancies.
In that case I'd suggest all subsystems to either use ActiveLeaves
(preferred) or OurViewUpdate
, but never both (just like tabs and spaces). Currently e.g. bitfield-distribution
activates leafs based on ActiveLeaves
, but cleans up the state on OurViewUpdate
, which seems inconsistent and not sure why we're trying to achieve with that.
In that case I'd suggest all subsystems to either use ActiveLeaves (preferred)
I agree with that assessment. I think OurViewUpdate
should only ever be used to determine peer reputation changes, when we're trying to answer the question 'why did they send us this?'
I suggest a utility for keeping track of any OurViewUpdate
messages sent within the last N seconds (grace period) which we can use across networking subsystems.
To avoid issues like https://github.com/paritytech/polkadot/pull/3519, we could make sure to deactivate leaves first before handling activation (which can potentially fail).
As suggested in https://github.com/paritytech/polkadot/pull/3519#discussion_r676491475, we could change the signature of activated
to be a single entry (Option
instead of SmallVec
) and and on startup https://github.com/paritytech/polkadot/blob/c0387bab91c324bb342e2c5de3910ad6d2d1b91b/node/service/src/lib.rs#L678-L680
this would result in multiple active leaves signals here https://github.com/paritytech/polkadot/blob/c0387bab91c324bb342e2c5de3910ad6d2d1b91b/node/overseer/src/lib.rs#L661-L675
As in #2790, we've observed a discrepancy between
ActiveLeavesUpdate
signal andOurViewChange
message.It can happen if we import two blocks in succession, one of which deactivates the other (parent and its child). Since we handle signal before messages, if we're using both the signal and the message to update the state, this might cause an inconsistency. An example of that is https://github.com/paritytech/polkadot/blob/8c7a6d9f0d8fa7cf129e72dd55750cb0e016aec4/node/network/bitfield-distribution/src/lib.rs#L571
Also note that with theMode::Sync
state we will pause sendingOurViewChange
but continue handlingActiveLeavesUpdate
.Another source of inconsistency comes from the fact that we limit our view to 5 heads only.