paritytech / polkadot-sdk

The Parity Polkadot Blockchain SDK
https://polkadot.network/
1.86k stars 682 forks source link

runtime refactoring: remove the claim queue from the scheduler #5529

Open alindima opened 2 months ago

alindima commented 2 months ago

If we remove the TTL, the claim queue in the scheduler pallet does not make much sense any more.

The claim queue remains merely an intermediate buffer to the coretime assigner pallet, which is populated with up to scheduling_lookahead claims. It also complicates things as assignments need to be pushed back to the assigner on session changes (if the core count decreases).

Backwards compatibility with the claim queue runtime API will be maintainted. The logic will be modified to peek up to scheduling_lookahead positions in the coretime assigner.

This greatly simplifies the scheduler pallet and helps with the separation of concerns.

As @eskimor pointed out, it also has a nice side effect that the latency between ordering an on-demand core and being able to use the core will decrease. This is because the on-demand order extrinsic reaches the on-demand assigner but not the claim queue (the claim queue is filled during inherent data processing, which happens before extrinsics). There was therefore a latency of at least one block between when the order is fulfilled and when the assignment could be used. If we modify the runtime API to look directly into the assigner, the parachain could utilise this claim right at the next block (granted that the core is available).

alindima commented 2 weeks ago

Another idea for refactoring the scheduler: https://github.com/paritytech/polkadot-sdk/pull/5461#discussion_r1796434872

eskimor commented 2 weeks ago

I just remembered another reason we had the claim queue storage: We need the predictions in the claim queue to be stable, in particular, entries should not just disappear. If we removed the claim queue, this could happen with on-demand, e.g. when the number of on-demand cores changes we could end up with assignments showing up on different cores, depending on the relay parent: The assignments would move around between cores.

alindima commented 1 week ago

I just remembered another reason we had the claim queue storage: We need the predictions in the claim queue to be stable, in particular, entries should not just disappear. If we removed the claim queue, this could happen with on-demand, e.g. when the number of on-demand cores changes we could end up with assignments showing up on different cores, depending on the relay parent: The assignments would move around between cores.

We could have the coretime assigner take care of this, right? Without that many changes I think. The WorkState for a core could do this buffering and store not just the CoreAssignment::Pool assignment type but also the paraid for each slot.

We'll still need the equivalent of advance_claim_queue but on the coretime assigner. The coretime assigner would just store this info. Then the runtime API will not end up calling into the on_demand pallet. The coretime assigner would have all the info buffered.

So we could merge the claim queue concept into the coretime assigner

I wonder if the practicalities of this would complicate things instead

eskimor commented 1 week ago

This stability issue can actually also happen with bulk. Assignments can change any time, hence you could arrive at different results. Arguably getting the fresh information is more correct, but it also means that if you prepared a collation based on the claim queue you will end up with that collation dropped, because the claim queue changed.

For bulk this actually seems acceptable ... as those collations had been produced for coretime you did not actually purchase and can also be fixed by having proper end_hint.

TL;DR: We might have an issue with bulk too, but for bulk it is definitely easily fixable.

For on-demand it is more tricky, but likely also solvable:

  1. Core removal - by having an end-hint we can know ahead of time.
  2. For a new core getting added: We can actually also know that ahead of time.

Hence, this is fixable for both bulk and on-demand by simulating the state of the block the claim queue entry is relevant for. (We can look into the future.) This should actually not be too hard and will also lead to a more correct behavior than what we have with the claim queue as of now.

Keeping the end_hint at None should actually be fine, as long as we check for incoming assignments that would void the current work state.