paritytech / polkadot-sdk

The Parity Polkadot Blockchain SDK
https://polkadot.network/
1.63k stars 571 forks source link

On-demand Cumulus Integration #1487

Open eskimor opened 9 months ago

eskimor commented 9 months ago

On-demand functionality present on the relay chain, which means technically on-demand is implemented.

  1. Someone places an order.
  2. Parachain sees scheduled core and produces a block.

This works, but leaves room for desire on (1). Who is going to place an order when? For very early on this might just be a developer with a laptop, trying things out, but at some point people will need this automated and decentralized.

Automation

Collators should be able to monitor some condition over time, e.g.:

and then when that condition becomes true, should automatically place an order.

In order to avoid having hot keys on the collator, we will with RFC-1 implemented have credits on the relay chain which can only be used for ordering a core. Until then we added an additional proxy kind, which restricts the proxy account to only placing orders. We intend to harden this further, so restricting the proxy to even just ordering cores for a particular on-demand para will be possible.

EDIT: Before proxy hardening, we should likely go for RFC-1 credits directly.

Now with security considerations out of the way, we need to:

  1. Have code to actually monitor block production conditions (as described above).
  2. Have some coordination between collators, so only one of them will place an order at a time.

Block Production Conditions

This is mostly about exposing some state that is relevant to the on-demand order placing logic. For actual decentralization, it will also be important that we have a condition that can also be verified/proven to be true in the para runtime.

Coordination

All collators will see any block production condition come true at approximately the same time, yet only one of them should be placing an order. To coordinate this, we introduce a mechanism similarly to Aura:

  1. Collators will monitor relay chain blocks and their height.
  2. They pick a responsible collator by mapping the relay chain block height where they saw the condition become true to a collator id, like this:
ordering_collator = collators[(h>>w) % collators.len()]

Where h is the relay chain block height and w is the slot width. 0 would mean an order slot is exactly one relay chain block, 1, would mean a slot is two relay chain blocks.

The slot width has to be picked based on the expectation how long it will take for a placed order to actually show up in a block. (How long will it need to be gossiped, how long will it linger around in the mem pool?)

The idea is, we have an asynchronous network and we still need to avoid having two collators place an order. To avoid this we pick a sensible w which should allow for orders to easily go into a block most of the time. The order placing collator will use transaction mortality to ensure that the order will either end up in a block still in its slot or is dropped.

The lowest transaction mortality possible is 4. Therefore the collator will always pick the first relay chain block of its slot as the base, with mortality 4 - even if the block production condition became only true later. We need to ensure that the order is already invalid when it's the next collator's turn.

The next collator, if also seeing the block production condition to be true, will then check the last 4 relay chain blocks for an order, if it finds one, it does nothing. If it does not find one, it will place an order by itself.

Summary:

  1. A the beginning of our slot, we keep a record of the hash of that relay chain block.
  2. Anywhere within our slot: If we find the block production condition to be true, we check whether no order has been placed in the previous slot relay chain blocks already (assuming above w and mortality).
  3. If conditions in 2 were true, we place an order with mortality 4 and relay hash base as recorded in 1.

With this, it should not be possible on a single relay chain fork for collators to place redundant orders by accident.

The only caveat is, that we limit our block rate to maximum 1 block every 4 relay chain blocks, because we don't check for already produced blocks: If the block production condition is true, but we have seen an order within the last 4 relay chain blocks, it might still had been served already and the condition is just still true or already true again. This should be easily fixable by adjusting above algorithm to also check whether an order has been served already. I would consider this low priority as at least in the beginning, on-demand is meant for chains that don't have high throughput requirements. Later when we might want to use on-demand for boost, this might become more important. Therefore do it, if it is easy enough, but certainly don't block a release on this. Also for boost/elastic scaling, the parachain is producing blocks already on a bulk core, which means it can place on-demand orders for elastic scaling by itself and non of this here is needed.

Decentralization

With the above we have achieved automation. Collators now know who should place an order when. But placing an order incurs costs. Why would collators bother placing orders, assuming we have decentralization? (They are not intrinsically motivated somehow.) The answer is, they would not. The collators would need to have orders placed in order to produce blocks, which gets them rewards, but they would also gain those rewards if someone else placed the order and they just produce a block when it is their Aura slot for example. Initial ideas for solving this dilemma aimed at strategies for ensuring that the one placing an order will also always be the block producer - so there is an incentive in placing the order, because you will the block producer and will gain rewards/refunds from the parachain logic. Those initial designs can be found here. The problem with those is, that they couple consensus very tightly on the on-demand feature, which is a pity by itself, because we wanted to make on-demand as transparent as possible, but it would especially be annoying to do the switch from an on-demand parachain to a reserved chain and back: You would need to swap out consensus!

We can fortunately think of order placement completely (almost) separate from actual block production. We only need to find a way for order placement to be incentivized. We need to have someone do it - it does not matter who. All we need, is to make sure collators produce blocks that will refund and reward the nodes placing the order.

So how do we do that? We can prove that an order was placed in a block, but we can not prove that it did not happen. Thus block producers can chose to omit any such proof. A related problem: How do we check that an order placing machine did not just ignore a previous order which would make it illegal for it to place an order as well (see section Automation above): What if nobody provides the proof of that earlier order?

To mitigate proof omission, we don't pay out refunds and rewards immediately, but will wait for each payout until at least f+1 collators produced a block since the payout in question (and had a chance of including the proof). We call this the challenging period.

We need two kinds of proofs that should be accepted at the ordered block, but also at any later block within the challenging period:

  1. The actual slot ordering proof: Consists of two parts: a) Proof that order was within a relay chain block that is an ancestor of our current relay parent (or the relay parent itself). b) A state proof for that same block, about the current spot price (so the parachain knows what to refund).
  2. A proof that one was the one eligible for ordering a core, but someone else also placed an invalid order (too close to ours). Such a proof would prove our order in a block corresponding to our slot and proving that a descendant of the block contained an order with the same sequence number, ignoring our order.

By adding sequence numbers to the orders (parachain advances it whenever an order proof was processed), we can make (2) a self contained proof that does not require the parachain to keep a lot of state.

Point (2) needs more explanation: We do want it to be possible for the next collator/order placing node to cover for the previous node failing to place an order, but it should only be able to be refunded if the previous node indeed failed to place an order.

Sequence numbers also help in making block producers honest: If they don't include the proof, their own next order would not be accepted, except if they picked the same sequence number, but if they did that they risk getting challenged and would lose their reward/refund.

Orders will only be refunded, if a block was produced for that order (ensured via sequence numbers). This is necessary, because otherwise collators could just order cores, without the block production rule being met. The block producer is incentivized to produce a block. On top of that we do have small order placement rewards, which should cover the risk of the block producer not producing and thus making us lose out on our refund.

WIP Implementation steps:

Node:

  1. Query runtime for current next sequence number.
  2. Add sequence numbers to order.

Runtime:

  1. Implement proof checking in a parachain pallet, bumping sequence numbers on each accepted proof.
    • Reject orders with a sequence number not matching the currently expected one.
    • Reject orders if placing collator was not eligible in the slot.
    • Reject block if block production rule was not met.
  2. Keep track of accepted orders (beneficiary, order sequence number) so people can challenge it - order by block number.
  3. At each block clean up old enough accepted orders (challenge period passed) and pay out rewards.

Runtime API

Call for collators, telling them whether an order should be placed/will be refunded - Option.

Proofs:

Order proof:

Merkle proof that order is in block + proof that the including block is a parent of some recent relay parent. (Full header chain)

Fraud proof:

Merkle proof of our order in a block, merkle proof of offending order in a block. Full header chain between those two and up to some recent relay parent.

Size considerations: By keeping in state not only the current relay parent, but also the relay parents of some n older blocks, required header chains should be relatively small, more sophisticated data structures are likely not needed, but worth a consideration.

bkchr commented 9 months ago

In order to avoid having hot keys on the collator, we will with RFC-1 implemented have credits on the relay chain which can only be used for ordering a core. Until then we added an additional proxy kind, which restricts the proxy account to only placing orders. We intend to harden this further, so restricting the proxy to even just ordering cores for a particular on-demand para will be possible.

We should go directly for core time. The pallet is ready and the parachain also almost. So, only the relay chain side is missing.

All we need is to force collators to produce blocks that will refund and reward the nodes placing the order.

If you need to force collators, the protocol is broken. Collators should be incentivized by something to build these blocks, otherwise they will just do nothing.

So how do we do that? We can prove that an order was placed in a block, but we can not prove that it did not happen. Thus block producers can chose to omit any such proof.

Can we not just record alongside the latest head of the para in the relay chain who paid for this core time? Could be some vector for when there was multiple core time purchases to make one state transition happening.

eskimor commented 9 months ago

We should go directly for core time. The pallet is ready and the parachain also almost. So, only the relay chain side is missing. noted.

If you need to force collators, the protocol is broken. Collators should be incentivized by something to build these blocks, otherwise they will just do nothing.

Poor wording. It needs to be ensured (by incentivization of some kind - or alignment of interest)

Can we not just record alongside the latest head of the para in the relay chain who paid for this core time? Could be some vector for when there was multiple core time purchases to make one state transition happening.

Getting there. This ticket is WIP - should have noted in the title :grimacing: . Also a good angle though, we aimed for simply making it possible to deliver a proof at a later point in time. Then a few censoring collators don't matter, as long as there is at least one who is honest/your friend or yourself. I like your approach though, might make things more straight forward.

eskimor commented 9 months ago
Can we not just record alongside the latest head of the para in the relay chain who paid for this core time? Could be some vector for when there was multiple core time purchases to make one state transition happening.

Does not help. While this would prove who ordered the core that was just being scheduled, it does not tell you whether the order was legit (whether it was your turn). To determine this you will still need to check which block included an order.

eskimor commented 9 months ago

Refunding logic (node + runtime) might even live on a different parachain. A generalized reimbursement chain.

Polkadot-Forum commented 8 months ago

This issue has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/parachain-consensus-updates-coretime-asynchronous-backing-scalability/4396/1

Polkadot-Forum commented 8 months ago

This issue has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/parachain-consensus-updates-coretime-asynchronous-backing-scalability/4396/2

Daanvdplas commented 7 months ago

I have a few questions:

  1. The lowest transaction mortality possible is 4. Therefore the collator will always pick the first relay chain block of its slot as the base, with mortality 4 - even if the block production condition became only true later. We need to ensure that the order is already invalid when it's the next collator's turn.

Do you mean here that a collator should only make an order if all the conditions are met on the first relay chain block of its slot? This to ensure that the order won't end up in a block in the next slot.

  1. We can prove that an order was placed in a block, but we can not prove that it did not happen. Thus block producers can chose to omit any such proof. A related problem: How do we check that an order placing machine did not just ignore a previous order which would make it illegal for it to place an order as well (see section Automation above): What if nobody provides the proof of that earlier order?

If someone didn't place an order while they should, the next will place an order. Is this not an acceptable solution?

  1. In the situation where someone wants to challenge an order but won't be selected by aura to author a block in time, how does this work?

  2. If a block is produced on an invalid order which is challenged (e.g. order is made while in previous slot an order is made), does the block producer need to be paid?

Polkadot-Forum commented 7 months ago

This issue has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/r0gue-reimagining-polkadot-development-with-pop-network/5119/1

eskimor commented 7 months ago

Do you mean here that a collator should only make an order if all the conditions are met on the first relay chain block of its slot? This to ensure that the order won't end up in a block in the next slot.

The condition can become true later, but we would still anchor the transaction to the first block of our "slot", in order for the lifetime to not reach within the next "slot". Alternatively it could also just not place an order, but then it might miss an opportunity.

In the situation where someone wants to challenge an order but won't be selected by aura to author a block in time, how does this work?

Challenge period has to be long enough, so we can assume at least one "honest" block producer had a chance to produce a block. challenge period can be based on blocks, so it would not matter how seldom the chain produces blocks. To enhance this even further, a block production rule could be: "I am comitting a proof, that someone before me cheated."

If a block is produced on an invalid order which is challenged (e.g. order is made while in previous slot an order is made), does the block producer need to be paid?

So in fact two block production opportunities have been ordered ... usually only one block will be able to be produced (block production rule is met). So yes, any valid block that is produced should be rewarded ... but there will only be one.

But I am not sure, that is what you are really interested in. Isn't it more if the block producer omitted a proof of the earlier order? So basically we have to subsequent orders - the second is invalid. The block producer only includes the second ... sounds reasonable to scratch block rewards for this guy - if it behaved correctly it should have put in the proof of the earlier order.

So two orders, only one block is produced and that block included the wrong/invalid order proof - this is a violation and there should be no reward (the least).

If someone didn't place an order while they should, the next will place an order. Is this not an acceptable solution?

Absolutely. That is the whole idea of these order slots. We coordinate responsibility, but also making sure if one collator fails to place an order that someone else will cover.

Polkadot-Forum commented 5 months ago

This issue has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/initial-coretime-pricing/5187/13

eskimor commented 4 months ago

@Daanvdplas any updates?

evilrobot-01 commented 4 months ago

@Daanvdplas any updates?

Daan is currently OOO but due back next week so I will follow up with him then.

eskimor commented 3 months ago

Magnet variant implemented: https://github.com/Magport/Magnet/pull/19/files

Polkadot-Forum commented 2 months ago

This issue has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/on-demand-around-the-corner-what-do-you-want-need/7382/1