Open alindima opened 2 months ago
Another idea for refactoring the scheduler: https://github.com/paritytech/polkadot-sdk/pull/5461#discussion_r1796434872
I just remembered another reason we had the claim queue storage: We need the predictions in the claim queue to be stable, in particular, entries should not just disappear. If we removed the claim queue, this could happen with on-demand, e.g. when the number of on-demand cores changes we could end up with assignments showing up on different cores, depending on the relay parent: The assignments would move around between cores.
I just remembered another reason we had the claim queue storage: We need the predictions in the claim queue to be stable, in particular, entries should not just disappear. If we removed the claim queue, this could happen with on-demand, e.g. when the number of on-demand cores changes we could end up with assignments showing up on different cores, depending on the relay parent: The assignments would move around between cores.
We could have the coretime assigner take care of this, right? Without that many changes I think.
The WorkState
for a core could do this buffering and store not just the CoreAssignment::Pool
assignment type but also the paraid for each slot.
We'll still need the equivalent of advance_claim_queue
but on the coretime assigner. The coretime assigner would just store this info. Then the runtime API will not end up calling into the on_demand pallet. The coretime assigner would have all the info buffered.
So we could merge the claim queue concept into the coretime assigner
I wonder if the practicalities of this would complicate things instead
This stability issue can actually also happen with bulk. Assignments can change any time, hence you could arrive at different results. Arguably getting the fresh information is more correct, but it also means that if you prepared a collation based on the claim queue you will end up with that collation dropped, because the claim queue changed.
For bulk this actually seems acceptable ... as those collations had been produced for coretime you did not actually purchase and can also be fixed by having proper end_hint
.
TL;DR: We might have an issue with bulk too, but for bulk it is definitely easily fixable.
For on-demand it is more tricky, but likely also solvable:
Hence, this is fixable for both bulk and on-demand by simulating the state of the block the claim queue entry is relevant for. (We can look into the future.) This should actually not be too hard and will also lead to a more correct behavior than what we have with the claim queue as of now.
Keeping the end_hint
at None
should actually be fine, as long as we check for incoming assignments that would void the current work state.
If we remove the TTL, the claim queue in the scheduler pallet does not make much sense any more.
The claim queue remains merely an intermediate buffer to the coretime assigner pallet, which is populated with up to scheduling_lookahead claims. It also complicates things as assignments need to be pushed back to the assigner on session changes (if the core count decreases).
Backwards compatibility with the claim queue runtime API will be maintainted. The logic will be modified to peek up to
scheduling_lookahead
positions in the coretime assigner.This greatly simplifies the scheduler pallet and helps with the separation of concerns.
As @eskimor pointed out, it also has a nice side effect that the latency between ordering an on-demand core and being able to use the core will decrease. This is because the on-demand order extrinsic reaches the on-demand assigner but not the claim queue (the claim queue is filled during inherent data processing, which happens before extrinsics). There was therefore a latency of at least one block between when the order is fulfilled and when the assignment could be used. If we modify the runtime API to look directly into the assigner, the parachain could utilise this claim right at the next block (granted that the core is available).