paritytech / cumulus

Write Parachains on Substrate
GNU General Public License v3.0
620 stars 380 forks source link

Bidder logic framework for Cumulus #2154

Open rphmeier opened 1 year ago

rphmeier commented 1 year ago

We should provide a pluggable framework in Cumulus for collators to determine when to bid on parathread blockspace. Parathread node authors should be able to use this framework to choose on the basis of:

  1. Relay chain blocks since last parachain block
  2. Block reward when authoring a parachain block (some runtimes may exponentially increase the block reward exponentially based on (1) )
  3. Transaction fees available in the parachain
  4. Other user-defined logic

Design

Requirements

We would want a block production mechanism with the following features:

  1. The one paying for a core should also be able to produce a block for the core (and get rewards).
  2. We need some coordination on who is next, otherwise we would end up in situations where it is always the same collator placing orders and gaining rewards or worse we get multiple orders in at once, racing where only one can win. The others will be served as well, but blocks will be empty.
  3. We need liveness. Therefore if the one who is "next" does not order a core, someone else should be allowed to fill in after a while.

(2) and (3) are elegantly solved with Aura, but that does not work for on-demand as we don't know the slot in advance a block will be produced in, hence we can not preserve (1). (3) is especially tricky, because we don't have regular block production intervals to begin with ... so how long should be a "while" and where does this "while" even start. We would need to have some proof when block production became viable/allowed (depending on strategy, e.g. transaction pool was filled enough), then we can count that "while" from that moment on. No idea how this would work though.

The most promising resolution seems to be to find a way to relax (1) and go with an Aura variant. But even with this, we would still need coordination on who is going to order the core when it is time and we would need others to cover if it does not happen in a timely manner.

Chosen Solution (alternatives below)

A super simple alternative by @burdges : on-demand Aura

We specify that the eligible block producer changes every N relay chain blocks, independently of any block production condition. If N is big enough, a collator can place an order at the beginning of its "slot" and there will be a very high probability that the order is served within those N blocks.

This solution assumes either very high N or not too much pressure on the system (orders can not be delayed too long). If orders are starting to be missed we have a few screws to adjust:

  1. Increase N (the parachain needs to do that)
  2. Increase the number of on-demand cores to handle the increased demand better
  3. Adjust the price function to make on-demand cores expensive earlier

Of the proposed solution, this seems to be a clear winner:

  1. Shines in simplicity
  2. No additional overhead

Covering for a missing collator might take longer in this variant (depending on N), but that should be a non-issue for on-demand.

What we would also like is for the parachain to simply work even if it claims a reserved core at some point. Initial idea by @rphmeier was to have the on-demand consensus run in parallel with classic aura, but unfortunately I don't think this can work. As then we would have the on-demand slots and regular slots overlapping, with the result that we will have multiple eligible (two) block producers most of the time, which enables the possibility of block stealing again either by accident or on purpose.

Instead I would be proposing the following: We only keep the on-demand Aura, as it would work in both cases. The only downside is that liveness is a little worse (a covering collator would only pick up the work after N blocks, instead of in the very next). This seems acceptable though and if the parachain transitions to reserved for a longer period, we should be able to reduce N to 1 via parachain governance for example, arriving at pretty much classic Aura.

Aura slots

Aura gets notified on block import notifications https://github.com/paritytech/cumulus/blob/2df9b9a049b23520a9bbf8fa1894cd629b8d419c/client/consensus/aura/src/collators/lookahead.rs#L148

which are actually already filtered based on whether there is an assignment for our para. Those notifications are triggered indirectly from the collation generation here. So far so good. This means, despite its look, Aura seems to only do something when we actually have a core (not on every relay chain block).

Now the not so good part. For on-demand to work reliably a collator must be able to order a core and know that it is going to be its turn. Otherwise even if refunding was fixed to go back through that collator, we would need to rely on another collator to actually produce the block in order for us to get the refund. Additional block rewards would still go to that other collator, so it does have an incentive to produce, but for the current collator this is still a sad story:

  1. Full risk of not getting refunded.
  2. No additional rewards.

In a nutshell: Only risk, no profit. It could still work in practice, because if nobody orders cores, nobody gets anything, but it would likely lead to inefficiencies.

So what is the problem? Aura is based on slots, which is time based. In particular it takes relay chain slot numbers to determine the current block producer in a simple round robin fashion. Given that we can not predict when exactly an on-demand order will be served, we cannot know for sure who is going to be the block producer for that block.

Finding potential para parents

It looks like we try to find a potential para parent by looking at relay chain ancestry, which is limited in depth. This cannot work with on-demand as the para parent might be way older. But it can also hardly work for any current parachain, as this would mean that any para stalling for more than 10 blocks would die for good. This does not seem to be happening, therefore I must be missing something.

Implementation Steps

*) Actually just some bounded blob of data. It can be a collator id, but the relay chain does not care what it really is.

Obsoleted solutions for the original requirements

Possible Solution

  1. The next block producer is predetermined with the previous block.
  2. Collators wait for the block production condition (e.g. enough transaction fees available) to trigger.
  3. The predetermined collator will place an order - in that case all good and we are done, otherwise we carry on with 4:
  4. All collators monitor the relay chain for an order placed event. If they don't see it within X blocks, then they sign and gossip a statement saying that they are missing an order that is due.
  5. Once a f+1 collators agreed that an order is missing, the next collator becomes eligible in placing an order.
  6. When the block is produced, the statements get included in the block, making the backup block producer eligible. This is required to make the block valid.
  7. In case the backup collator fails to place an order as well after another X blocks, the game repeats.

Considerations: It could happen that the collator re-appears after X blocks and missed the statements. In that case it will now try to place an order as well, racing with the backup block producer. This is undesirable for both, but should be acceptable with X being large enough.

In case an order was placed, but no block was being produced, we will clear the "I have seen the order" status once the order goes out of scope and immediately issue "have not seen order" statements (if X has passed already).

This should work, but also seems to be quite involved. In particular with the f+1 threshold above we should have pretty good liveness. As this would even work with only two collators in the set: f+1 would be 1 (integer arithmetic), hence the signature of the backup collator would suffice to make it eligible. f + 1 should ensure that there was at least one honest collator voting. In case of wrongly voting, all that happens is that we have racing orders, which seems better than risking liveness.

Alternative 1

More observations:

  1. Let's assume that we can prove that a block production rule became true. This seems possible, It is only hard to prove when it happened.
  2. We can assume orders to be placed (not served) very quickly. It is rather likely that two orders sent at approximately the same time will end up in the same relay chain block.
  3. We should be able to bundle up the proof that an order was placed with all the other orders placed for a particular ParaId in a given relay chain block. Hence if one wants to prove that an order x was placed in that block, it can not omit the proof that an order y was placed as well.

With this we can require block authors to include a proof in the block that:

  1. The block production rule was met (e.g. enough transaction fees, enough relay chain blocks passed) ... should be pretty straight forward/given implicitly.
  2. Of the order placed event on the relay chain ... which by above would include any orders placed in that block on the relay chain for that para.

Then we would make it so that a block is invalid if (1) the block production condition was not true and (2) there exists an order from a more eligible collator in the proof.

Caveats:

Most important: An eligible collator could trick other collators into placing orders that they can not consume: They would wait for the backup collator to kick in and then immediately place their order to render that order invalid. As we can not prove when the block production condition became true, there is no straight forward way to prevent such sneaky behavior. Buut we can have proper incentives.

Observation: There is no benefit to the eligible collator doing the above, it would only harm another collator. If such a behavior would lead to personal loss this could be good enough.

Idea: If there are multiple orders, the reimbursement for the order is split among all the orders in that relay chain block. Hence everybody will experience a net loss, if they don't adhere to the off-chain protocol. (Give the eligible block author a reasonable amount of time to place an order.) With enough wait time we should be able to account for any reasonable asynchronicity in the block production rule becoming true on different nodes. With on-demand this should really be a non-issue as wait times can be configured pretty long as we can assume parachains are producing blocks quite infrequent (otherwise they would not be on-demand). A few relay chain blocks of wait time, before the next collator covers should work just fine. Buuut, this would still mean that whales, who don't mind the loss, would be able to drive small fishes out of business. Still not good.

Benefits:

This solution should be a bit simpler to implement and would also involve less overhead.

eskimor commented 1 year ago

Size of transaction pool

bkchr commented 1 year ago

Size of transaction pool

Don't you get this indirectly by "transaction fees available"?

In general for this issue, not sure we need to create some kind of framework. I mean we should probably provide some implementations using different kind of strategies that can be used as examples.

rphmeier commented 1 year ago

Yeah, I mean the "framework" only loosely. The information just needs to be available, as well as some easy hook to trigger orders.

Polkadot-Forum commented 1 year ago

This issue has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/on-demand-parachains/2208/1

burdges commented 1 year ago

I'd forget 3 completely since someone else could pay too. I'd propose this scheme:

At the relay chain side, we've a purchase_blocks call which spends some amount of DOTs on parathread blocks by some minimum larger than the current parathread block price, and provides an opaque value buyer. All it does is grow some buyers: Vec<(dots,buyer)> in the parathread's state on the relay chain. An accept_buyers intrinsic/message from the parachian side can empty buyers and increase some bought: i32 value by the number of blocks this number of dots covers. A parachain can make a block so long as either buyers.len() + bought > 0.

Parathreads have completely self defined runtime logic for determining the next block, including tricks like sassafras or whisk. We want parachains to expose some allowed_connection function for validating the next block producer though, which returns some Resutl<Allowed,NotAllowed>.

A good backing statement subtracts 1 from bought, but the parablock itself could execute accept_buyers so that even if bought=0 then we expect bought >= 0 after inclusion. If this is not true then the parablock is invalid.

If otoh we have bought < 0 after inclusion then instead a backer could (optionally) make a bad backing statement, which executes the block but does not update the state roots. A bad backing statement might just lock the parachain, or deduct form a random element in buyers or simply call buyers.clear(). A bad backing statement is invalid if bought >= 0 after execution, slashing the backer.

A parachain is responsible for ensuring that allowed_connection never permits backers to make bad backing statements. We should discuss when we want allowed_connection to be run and if it could realistically enforce this condition. If allowed_connection can work sanely then this scheme provides rather extreme flexibility.

eskimor commented 1 year ago

I am not sure I get how this maps to the listed requirements. For forgetting (3): Yes someone else could pay too, but you still need coordination: Who else should pay? When do you decide that you should pay? We need some coordination here, some coordination that takes into account that collators can be offline.

eskimor commented 1 year ago

Uh I think I am starting to getting it: You are thinking probabilistic, that orders should stay valid for long periods of time and can be consumed whenever. This is not how on-demand works though. An order placed will be served as soon as possible and then you have to produce a block, otherwise you just wasted a core and money.

burdges commented 1 year ago

Yes, I'm thinking orders should stay valid for a long period or time. It's clear we need restrictions upon opening connections of course, but unclear why this means purchases expire quickly.

As a clarification..

A parachain's own code should only send the accept_buyers message when it also translates those opaque buyer fields into its own restrictions upon block production within its own state.

A parachain's own block production rules plus accept_connection should then obey these restriction.

A prachain could enforce that only the real buyer can make the block, or tax them to open some blocks for everyone, or do probabalistic thing, or whatever.

I'm now dubious this bad backing statement makes sense, maybe allow_connection should just always produce a valid payment, but the backer gets more money if the block becomes available, finalized, etc, via the rewards system which we've never quite finished.

burdges commented 1 year ago

Anyways, we've always wanted to permit parachains to design their own block production strategies. We do imho need an interface that excludes some really stupid ones, for which no allow_connections function makes sense, like proof-of-work. We'll save ourselves lots of trouble though if we've one interface broad enough to permit aura, sassafras, whisk, and some others, in conjunction with flexible economics rules.

I think parathreads should be able to impose a tax upon parablock buys, but the relay chain should know nothing about how their tax works. This is one reason for the opaque buyer field being processed on the parachain side.

At the same time, we need parathreads to make blocks even if they have exhausted their balance, which is why the accept_buyers can come inside the block that spends the buyer here, aka we permit bought=0 both before and after the block.

burdges commented 1 year ago

As an alternative..

We could decide that on-deman parachains should've one or two good block production and economic models, but they need not be anywhere near as flexible as full parachains. If you want your own specialized block production and economic models, then just rent a full parachain slot. It only costs like $25k per year, so we'd be over engineering if we try to make the on-demand option super flexible.

Is this closer to how others view the problem? I do not mind this approach, but we should decide that parathreads should be a less flexible product in some respects.

burdges commented 1 year ago

Alright @eskimor and I discussed this, all this pre-scheduling stuff exists to avoid doing a separate protocol for parathreads. We do more complex code elsewhere but avoid subtle refactoring.