Update statement-distribution for asynchronous backing

If the runtime API version indicates that asynchronous backing is enabled, we'll need new logic in statement-distribution.

Goals

The statement-distribution subsystem has these high-level goals:

Distribute locally-generated statements about candidates to the network
Participate in the distribution of statements by other validators
Receive relevant statements about candidates from peers
Filter out spam: nonsensical statements or statements by spamming validators

Recap

By statement, we mean something roughly like

enum Statement {
    Seconded(RelayParentHash, CandidateHash),
    Valid(RelayParentHash, CandidateHash),
}

Validators sign these statements about candidates, which in turn are gossiped over the network and can be included alongside the candidate on-chain.

Validators should only issue Valid statements for candidates that they've seen Seconded, and there are restrictions on how many Seconded statements can be produced by a validator, making Seconded statements the core means of preventing spam. In fact, the changes to semantics here as a result of #4963 are the main reason the logic of this subsystem needs to be changed.

Spam Protection

There are two kinds of spam that we are worried about.

Nonsensical or unanchored messages, which can be produced by any rogue validator
Infinite numbers of valid, backed candidates, which any rogue validator group can produce.

Status Quo

Without asynchronous backing, parachain candidates were required to have the most recent relay-chain block as their relay-parent, which in turn implied that no two sequential parachain blocks could have the same relay-parent. Spam prevention was relatively simple: we required that each validator could second no more than 1 candidate per relay-chain block, and that Valid statements could only be sent to peers we were sure had the corresponding Seconded statement - either because they sent it to us, or we sent it to them. Valid statements not corresponding to a known Seconded block could be ignored, and the amount of Seconded statements to consider was bounded by the number of validators.

We exchange Views with our peers, which indicate our current active leaves. Our peers send us only candidates that have anything to do with these active leaves. Each active leaf is a blank slate.

Changes

With asynchronous backing, there's no longer any requirement for sequential parachain blocks to have different relay-parents. However, we have to rein in the number of possible candidates somehow, otherwise malicious validators or validator groups could produce an infinite number of valid parachain blocks and furthermore, an infinite number of valid prospective parachains of infinite length. These are the problems we need to avoid.

The model we introduce is based off of "prospective parachains", where backing subsystems buffer a few parachain blocks off-chain. This is coordinated by the Prospective Parachains subsystem #4963 . The buffers are actually trees with an enforced max depth. The prospective parachains subsystem is designed to work hand-in-hand with higher level spam prevention to avoid the trees growing too large - and statement-distribution fulfills that role.

Each new active leaf is no longer guaranteed to be a blank slate, but instead will initially be populated by a FragmentTree from the set of already known candidates. Peers keep track of which candidates they have sent and received messages for.

Candidates don't have a globally unique depth, and they don't even have a unique depth per active-leaf. It is legal, although unexpected, for head-data to cycle and relay-parents can now stay the same. That is, each fragment tree gives a set of depths for any candidate, which is usually either empty or has length 1, except for pathological parachains. This set of valid depths is defined as the depths in the tree at which the candidate appears.

Outdated first attempt (see https://github.com/paritytech/polkadot/issues/5055#issuecomment-1180914398)

This is probably the trickiest conceptual bit, but here goes: we keep candidates and messages referencing them around until the union of the valid depths is empty at all active leaves. The valid depths for a candidate trend towards the empty set because the candidate is eventually either included or orphaned.

So here's how we use depths to prevent spam:

We keep the rule that we only accept Valid messages after becoming aware of Seconded messages for a candidate
We accept only one Seconded message per validator per depth per active-leaf. Seconded messages which refer to candidates that would have an empty set of valid depths at all of our active leaves are considered spam.

We already know which active-leaves our peers are aware of through their views, and we're aware of which candidates we've received from/sent to our peers, which means that it's not too involved for us to build up a view of their own FragmentTrees which we can use to send them things.

So, in practice, this 'depth' approach also means that validators have to choose a fork of the parachain to stick to: it sets an upper bound of n_validators*max_depth nodes in the tree, not n_validators^max_depth.

Practicalities

Most of the issues in implementation will revolve around race conditions:

Prospective fragment trees are actually handled within another subsystem. But we need to maintain some kind of mutable fragment trees within the statement-distribution subsystem itself in order to handle incoming streams of messages
So far, we've been imagining that the Prospective Parachains subsystem will inform backing subsystems about changes to fragment trees via signals from the overseer (#4961), but such signals will race against active leaf updates and view updates to our peers. Instead, we might have backing subsystems pull the data from the Prospective Parachains subsystem.

I've been reflecting on how this would be implemented in practice with a few questions:

Which subsystem should be responsible for informing the prospective parachains subsystem of new things being seconded/backed?
How should statement-distribution tell whether an incoming statement would be present in the fragment tree, without adding it? (spam protection)

The architecture that I'm coming to as a consequence is that (1) should be candidate-backing, because prospective-parachains has to be informed about when candidates are first seconded and when they're backed, and candidate-backing is the only place where backing votes are counted.

broadly, statement distribution needs to be able to ask 2 kinds of questions. The first is asked when we receive a new Seconded message and takes the form "what would be the valid depths of this candidate in the fragment trees of our active leaves?". The second is "which of these candidates still have non-empty valid depths under any of the current set of active leaves?" and this is called on ActiveLeavesUpdates when leaves get removed and new ones get added.

The distinction between the former and the latter is that the former deals with hypothetical statements (potential spam), whereas the latter is garbage-collection (potentially outdated) and all statements should have already been imported into the prospective parachains subsystem. However, without a change in how we pass statements to candidate-backing, there's the distinct possibility that they have not been imported. That is, if we just hand statements over to candidate-backing, it's possible for us to receive a new ActiveLeavesUpdate before candidate-backing has informed the prospective-parachains subsystem of the candidate, and then mistakenly clean up the candidate.

The workaround is to attach a response sender to CandidateBackingMessage::Statement. This response sender is responsible for informing us when and whether the statement has been accepted and the prospective-parachains subsystem has had the candidate imported and at what depths under what relay-parents. Since the spam-filtering question (1) above doesn't check all the constraints but only the possible positions of a candidate with a given hash and parent, it's possible that the CommittedCandidateReceipt can't actually be placed in those positions and would fail. Statement distribution can block on the response sender (for seconded messages relating to unknown candidates) before fully accepting the message. (1) is still useful because it's a question we can ask before downloading large candidate receipts, assuming that parent head hash is part of the notification of a large statement. If we removed large statements entirely (i.e. by moving code upgrades and XCMP messages out of the candidate receipt) then we might not need to ask (1) at all.

One last point is that while it's technically possible for importing a candidate A into a fragment-tree could enable a previously-known candidate B to be imported as its child, this is a real edge case and we can ignore it in statement-distribution without major consequences. This is a byproduct of the fact that candidates express their parent as head-data, and not as a candidate hash, which means that we can encounter multiple inheritance problems. A -> B -> C, but then receiving a B' with the same head-data as B we can get A -> B' -> C as well in the same tree. This is possible, for instance, if B and B' had different relay-parents (and the PVF didn't reflect that change in initial conditions in the output head-data for the state transition. unreasonable, but maybe possible...). We'll be slightly less strict on spam prevention by ignoring this but our memory usage is still bounded and the end result is only that crazy parachains may be less reliable.

Here's an alternative version which doesn't care about equivocations (which are damn complicated to report to the chain anyway until we have sequence numbers in candidate receipts) but still handles spam and keeps bandwidth relatively low. It does this by having validators operate directly within their groups and only circulate backed candidates to the rest of the network.

Requirements:
- Ability to handle ~1,000 real unbacked candidates
- Ability to handle large amounts of spam candidates
- Very fast (200ms) delivery of candidates and statements within group/next group
- Eventual and relatively fast (2-3s) delivery of candidates and statements to all validators
Observations:
- In the presence of spamming validators the amount of candidates we may have to handle grows significantly
- Candidates that build upon prior unbacked candidates are very likely to have been seen by members of the group almost by definition, especially because execution time and block building should dominate distribution time.
- Set reconciliation is difficult because the size of the possible set of allowed candidates is quite large (therefore differences are large and sketches are large)
- We expect that due to UMP and HRMP candidate sizes will be relatively large.
Design: distribution within groups for unbacked candidates, grid distribution outside of groups for backed candidates and late statements.
Direct distribution within groups and on-deck
- When a node seconds or validates a candidate it sends a Statement(RelayParent, CompactStatement) to all nodes in the group. These are only sent to other nodes in the assigned group and on-deck group, and only if/when the peer has the relay parent in their implicit view.
- note on Seconded equivocations - until we turn candidates into chains or have them commit to sequence numbers to avoid cycles, an apparent violation of the "one per depth per active-leaf" rule could be a second-order violation caused by another cycle. we don't really need to worry about these with this design as we can already effectively avoid spam.
- If the candidate in the statement is unknown the receiver requests the candidate and all Seconded statements. If there is no response or an invalid response, the statement is discarded and the peer punished. If there is a valid response, then the candidate is placed into storage
- There's a small number of validators in each backing group, which limits the amount of candidates that we'd ever have to contend with. We accept up to max-depth + 1 Seconded statements per validator in the group per relay-parent. If we get too many, we simply drop them. With 5 validators and max-depth 4 we'd accept 25 per relay-parent. Implicit views are small.
- We accept candidates which don't have an obvious place in the fragment trees of any relay-parents, which means we can only import them later. We listen to the prospective parachains subsystem to learn of new additions to the fragment trees and use this to attempt to import later.
Distributing fully backed candidates to the rest of the network
- Once the candidate is backed, wait for some time T to pass to collect additional statements. Then produce a 'backed candidate packet' (CommittedCandidateReceipt, Statements)
- Members of the backing group produce announcements of fully-backed candidates whenever those are finished. These have the form BackedCandidateInv(RelayParentHash, CandidateHash, ParaId, ParentHeadDataHash, SecondedInGroupBitfield)
- These are sent along the grid topology to peers who have the relay-parent in their implicit view
- The SecondedInGroupBitfield is a bitfield with n_group_validators bits, which are set to '1' if the corresponding validator has seconded the candidate. i.e. with 8 or fewer group validators this would be a byte.
- Only sent by 1st-hop nodes after downloading the backed candidate packet
- ignored when received out-of-topology.
- On every local view change, members of the backing group rebroadcast BackedCandidateInv for all candidates under every new relay-parent across the grid topology, ignoring peers who know the candidate.
- Nodes should send a BackedCandidateKnown(CandidateHash) notification to any peer which has sent a BackedCandidateInv and the candidate has been acquired by other means. This keeps alternative paths through the topology open for the purpose of getting additional statement later.
- req/res for the candidate + votes. ignore if they are inconsistent with the inventory message.
- A malicious backing group is capable of producing an unbounded number of backed candidates. we request the candidate only if the candidate has a hypothetical depth in any of our fragment trees and the seconding validators have not seconded any other candidates at that depth in any of those fragment trees.
- All members of the group attempt to circulate all statements (in compact form) from the rest of the group on candidates that have already been backed
- They do this via the grid topology - they add the statements to their backed candidate packet for future requestors and also send the statement to any peer which we advertised the backed candidate to and has previously & successfully requested the backed candidate packet or which has sent a BackedCandidateKnown.
- 1st-hop nodes do the same thing
- This wouldn't be necessary if we had a runtime mechanism for validators to place their statements on-chain after the fact and still get rewarded. But we don't, so we need this to be fair.

I believe this captures what we discussed, with way more details of course.

We should discuss when/if these possibly spammed backed candidates get propagated. Imagine, every validator only forwards at most one BackedCandidateInv, or maybe just (CommittedCandidateReceipt, Statements), for each (RelayParentHash, ParaId). We still have (multiple) backed candidates arriving at each validator. Are we breaking anything?

It maybe worsens censorship problems for parachains?

Imagine, every validator only forwards at most one BackedCandidateInv, or maybe just (CommittedCandidateReceipt, Statements), for each (RelayParentHash, ParaId)

I'd answer a slightly different question first: what happens if the grid relayers relay nothing at all to try to censor the parachain?

We have two levels of redundancy that seem to alleviate this:

2 paths through the grid for all messages A->B, so both C and C' would have to refuse to propagate.
H honest validators in the group originating the message.

re: censorship, the way I look at it is that the adversary is trying to get a proportion p = n / N of validators aware of the backed candidate as low as possible so as to minimize the likelihood that the next block author can post the backed candidate on-chain.

For this analysis I'm assuming that the group is majority-honest because otherwise censorship of a parachain is easy: just don't vote for anything. And assuming that the group is seconding blocks at the expected rate.

In the base case where nobody except the backing group learns of the candidate, we'd have something like n = H, where H >= backing_threshold(group_size).

Validators originating a BackedCandidateInv should be directly connected to about 2*sqrt(N) others in the grid topology via their row and column, so assuming all grid forwarders don't forward messages, then we'd get n = Ksqrt(N) * H where 1 < K <= 2 due to shared rows/columns.

so I'd estimate that the worst-case of censorship if attacking nodes just don't participate in the grid is to reduce the likelihood of a block being authored containing a candidate from the parachain to ~ sqrt(N) / N but that it gets there fairly slowly and asymptotically - each honest relayer in those sqrt(N) nodes adds another sqrt(N) validators who can include the parachain candidate if they're selected as a relay-chain block author.

To answer the question you actually asked, I suspect that as long as grid relayers relay exactly 1 candidate per parachain at each position (which they might do as a strategy to reduce bandwidth), then the entire network will see at least 1 thing that can be bundled in the next relay-chain block, even though they might not all see the same thing. That seems fine to me and quite possibly a valid argument for implementing the code this way. If the parachain collator-selection is working correctly, there should only be 1 valid thing at any point.

If 1/3 of validators wanted to censor a parachain then they might behave worse by stopping approval checks for the parachain, which gives a separate soundness adversary much better odds of breaking soundness, like printing the parachain's governance token for themselves. In this, we've broken the 2/3rd honest assumption though because I made the soundness adversary separate from the "fuck them" adversary, so yes we're still going to assume 2/3rd honest in censorship worries..

As availability needs connections between all validators anyways, we're always free to ephemerally vary the grid frequently, or even per message, like the relay parent hash, or assignment VRF in assignments, determines our permutation of the validator set into the grid positions. We then assume the censoring adversary is small so we can ignore the risks of a few unlucky candidates needing to pass through censoring nodes, or perhaps vary the grid with each backing validator group rotation, maybe worth analyzing this claim first of course. It's also possible validator group rotation already handles this just fine.

We should not imho vary the grid by candidate receipt because the grid provides a nice spam stopping function if all backing messages from a validator for a specific relay parent hash take the same grid route.

We've discussed set reconciliation before, but we could do set reconciliation at most one fully backed candidates per backing validator, which could then run outside the grid or on yet another grid, maybe or maybe not ephemeral. Did you mean the set reconciliation protocols you know do not handle large enough sets?

If 1/3 of validators wanted to censor a parachain then they might behave worse by stopping approval checks for the parachain

Yeah, there are maybe worse things that attacking validators could do to try to censor a parachain. I was limiting the scope of my comment above to only this protocol.

As availability needs connections between all validators anyways, we're always free to ephemerally vary the grid frequently

This probably helps somewhat in practice, although I think the effect would be a 'smoothing' of the censorship curve. Also agreed we shouldn't vary per candidate-hash, as that can be chosen by validators.

Did you mean the set reconciliation protocols you know do not handle large enough sets?

They can handle large sets, but not in a way that'd be bandwidth-efficient for statement distribution, as far as I could tell. We have to handle very large sets at very high speeds, and the approaches of most set reconciliation protocols for transactions to first do a round of flooding and then a couple rounds of set reconciliation weren't suitable here.

Alright, our toolbox should remain grid-like for now then, so varying grids, analyzing grid dimension, analyzing non-grids with grid-like properties ala slimfly, etc, thanks.

re: https://github.com/paritytech/polkadot-introspector/issues/91

One of the main challenges with these types of closed-topology gossip protocols is observability of network state by external nodes. We should build in some kind of pub/sub thing on top of this where peers who we wouldn't normally send BackedCandidateInv statements to (and expect them to request) can request to subscribe to us and we can accept or reject based on capacity. Those peers in turn can have their own subscribers, and so on. This type of pub/sub construction would work well in approval distribution as well, so external nodes could listen on p2p in a friendly way.

@rphmeier I assume this is just similar to what I was describing there, but from an implementation perspective it will be done at the network protocol level, right ?

I like the observation that with async backing we no longer need to propagate everything to everybody immediately and instead have a two phase process. The "tell what we have" in a compact form and then fetch if needed also sounds like the way to go.

One thing I am not sure I understand what it is about is the following:

Nodes should send a BackedCandidateKnown(CandidateHash) notification to any peer which has sent a BackedCandidateInv and the candidate has been acquired by other means. This keeps alternative paths through the topology open.

Keep alternative paths through the topology open? I assume this is, because the sender would assume we are dead/malicious if we don't fetch otherwise? But still how would this close a path through the topology? It would certainly prevent pointless re-sends on view change though.

@sandreim

from an implementation perspective it will be done at the network protocol level, right ?

Yeah. Doing this at the network-protocol level opens up more doors for alternative implementations or for polkadot-introspector to not require knowing any node. I'd consider this an issue that's not urgent by any means but also quite interesting & useful work.

@eskimor

Keep alternative paths through the topology open? I assume this is, because the sender would assume we are dead/malicious if we don't fetch otherwise? But still how would this close a path through the topology?

I wasn't very clear in my wording there, but what it specifically refers to is the part of the protocol used to distribute additional votes to everyone. i.e. a candidate can be backed with 3 votes but might get 5 eventually.

Some node might get two grid peers advertising BackedCandidateInv and only request from one of them. The issue if we don't send CandidateKnown to the other is that only one grid peer knows that the node has the candidate, and therefore only that node may forward along further statements. By sending CandidateKnown to the other grid peer, we keep a path open in the topology to get the remaining statements about the candidate as they're produced.

I amended the language in the post to make things more clear.

Uh got it. Didn't realize that we keep sending statements after we answered with a packet.

So in summary, protocol in phase 2 is:

We wait for a candidate to be backed + a little longer
We announce the fact, together with a bitfield telling who seconded that candidate. (Purpose limit the amount of stuff that can be seconded for spam prevention): notification protocol
Receiving nodes request the full packet (all statements + receipt) and tell other senders that we know the candidate. -> req/response
On new statements we send those statements to nodes who already successfully fetched -> notification protocol

On 4 ... only new statements? We actually don't know which ones the requester has, if it fetched them from another peer. To avoid censorship we kind of have to send all of them. Which should be fine as the heavy load was handled via req/res - we only send compact statements here. What is not quite clear to me: When do we send those statements? What are the triggers? Also on view change like the Inventory messages?

This was implemented in #5999

paritytech / polkadot