paritytech / polkadot

Polkadot Node Implementation
GNU General Public License v3.0
7.14k stars 1.58k forks source link

subsystems: use `ActiveLeavesUpdate` instead of `OurViewChange` #2812

Open ordian opened 3 years ago

ordian commented 3 years ago

As in #2790, we've observed a discrepancy between ActiveLeavesUpdate signal and OurViewChange message.

It can happen if we import two blocks in succession, one of which deactivates the other (parent and its child). Since we handle signal before messages, if we're using both the signal and the message to update the state, this might cause an inconsistency. An example of that is https://github.com/paritytech/polkadot/blob/8c7a6d9f0d8fa7cf129e72dd55750cb0e016aec4/node/network/bitfield-distribution/src/lib.rs#L571

Also note that with the Mode::Sync state we will pause sending OurViewChange but continue handling ActiveLeavesUpdate.

Another source of inconsistency comes from the fact that we limit our view to 5 heads only.

rphmeier commented 3 years ago

OurViewChange relates to the network and is meant to tell networking subsystems what our peers know about us. It's not meant to inform the work that the subsystems themselves do, that's what ActiveLeavesUpdate is for.

We can improve our handling of the discrepancies between these notifications but they are actually designed to have discrepancies.

ordian commented 3 years ago

We can improve our handling of the discrepancies between these notifications but they are actually designed to have discrepancies.

In that case I'd suggest all subsystems to either use ActiveLeaves (preferred) or OurViewUpdate, but never both (just like tabs and spaces). Currently e.g. bitfield-distribution activates leafs based on ActiveLeaves, but cleans up the state on OurViewUpdate, which seems inconsistent and not sure why we're trying to achieve with that.

rphmeier commented 3 years ago

In that case I'd suggest all subsystems to either use ActiveLeaves (preferred)

I agree with that assessment. I think OurViewUpdate should only ever be used to determine peer reputation changes, when we're trying to answer the question 'why did they send us this?'

I suggest a utility for keeping track of any OurViewUpdate messages sent within the last N seconds (grace period) which we can use across networking subsystems.

ordian commented 3 years ago

To avoid issues like https://github.com/paritytech/polkadot/pull/3519, we could make sure to deactivate leaves first before handling activation (which can potentially fail).

ordian commented 3 years ago

As suggested in https://github.com/paritytech/polkadot/pull/3519#discussion_r676491475, we could change the signature of activated to be a single entry (Option instead of SmallVec) and and on startup https://github.com/paritytech/polkadot/blob/c0387bab91c324bb342e2c5de3910ad6d2d1b91b/node/service/src/lib.rs#L678-L680

this would result in multiple active leaves signals here https://github.com/paritytech/polkadot/blob/c0387bab91c324bb342e2c5de3910ad6d2d1b91b/node/overseer/src/lib.rs#L661-L675