eskimor commented 1 year ago

In order for on-demand (and other forms of more exotic scheduling) to take advantage of asynchronous backing we need to expose the claim queue to the node side and have validators have accept collations for any ParaId that is in the claim queue, not just the next scheduled entry.

[x] Expose https://github.com/paritytech/polkadot-sdk/pull/3580
[x] Make use of it in collator protocol so collations will be accepted in advance.
[ ] Make use of it in collators: Implement strategies for the right lookahead in collators (see comment below)
[x] Fix validator usage as well. E.g. in backing we will need to be more lenient and accept statements for any para in the current claim queue.
[x] Orthogonal: Is the runtime (inclusion) already lenient enough. This is not directly related to exposure of the claim queue, but we need to double check that the runtime itself is taking it into account everywhere.
[ ] Ensure fairness between paras in the collator protocol.
[x] Statement distribution and other subsystems need to take full claim queue into account, not only the next scheduled.
[x] Verify asynchronous backing working as expected with a core shared by at least three paras, in round robin.

Fairness in collator protocol

To ensure one para can not starve another in the collator protocol, we should become smarter on what collations to fetch. E.g. if we have the following claim queue:

[a, b, c, d]

and we have the following advertisements:

a,a,a,a,a,a,b,c,d

Our fetches should look like round robin:

a,b,c,d,a,a,a,a,a

eskimor commented 8 months ago

Little illustration how collation providing will work with the claim queue. Most of the time (when running) the second item in the claim queue will be the most relevant one, as this is the one that can be provided "on time". The first item in the claim queue is meant to be backed in the very next block (synchronous backing). So when starting up, we will provide the first time, but will be too slow, so it will go into the block after the next one. In the meantime we will already provide the second item, which then will be ready for the block afterwards - now we are "in sync" and will always prepare the second item in the claim queue, while the first one is already waiting to be picked up by the block producer.

If a para b sees the following claim queue for the current relay parent, it should start preparing the collation to get it in on time:

[a, b, c]

Looking even further ahead (if claim queue is long enough), is not required for operation, but is a legitimate block production strategy. E.g. to avoid more relay chain forks (and thus unnecessary work), one could on purpose pick a relay parent that already has a child block and then pick the third item in the claim queue (which corresponds to the second item for the most recent relay parent claim queue).

alindima commented 8 months ago

This code in prospective-parachains will also need to be changed to use the claimqueue API: https://github.com/paritytech/polkadot-sdk/blob/2e4e65711233c6f3a1adc9ce49af8f4537de5439/polkadot/node/core/prospective-parachains/src/lib.rs#L843

I think it should lookahead according to allowed_ancestry_len or scheduling_lookahead, in order to allow collations coming from upcoming paras.

bkchr commented 7 months ago

Make use of it in collators: Implement strategies for the right lookahead in collators (see comment below)

https://github.com/paritytech/polkadot-sdk/issues/3168 is included there. The collation generation subsystem will not be used for this anymore. Already the lookahead collator is not using the collation generation subsystem for triggering the block production.

paritytech / polkadot-sdk

Expose and use claim queue for asynchronous on-demand operability #1797

Fairness in collator protocol