Open kianenigma opened 2 months ago
Acknowledging the issue.
Not sure how much availability I have, but I can def mentor someone. Depends on urgency. If not urgent, I could probably get small pieces of this story done over the weeks.
This falls into the more long term requirements of Asset Hub, not being needed until the very final days. In that sense, I was going to suggest you start working on it after your current project is done and roughly by end of DevCon?
Relies on #5400. @shawntabrizi would you like to work on this after your current work? It seems to fit your aptitude very well.
This is using a binary merkle tree and the chain is using a 16 patricia merkle tree. They are not compatible. We already have other code in historical session that does the checking of proofs already.
Generally, with the development of JAM, we will not have this luxury of having an extra governance sitting on the relay chain. So, when in JAM all chains stop, we don't have governance as well. So, a little bit questionable if we need this pallet at all. Or do you just want it for the period where governance switches over to AH and we are afraid of it not working properly on AH?
Afaik XCMP need real state proofs into other parachain's state, so one could abstract that somewhat.
We'll avoid starving "true system parachains" ala https://github.com/paritytech/polkadot-sdk/issues/4632#issuecomment-2209188695. We've not concretely defined that term yet, maybe audited like polkadot itself and no flexible execution aka no smart contracts. Also maybe no advanced collator communication, which maybe forbids elastic scaling. We've discussed reverting code upgrades automagically too, but afaik nothing currently in progress, and maybe imposes design restrictions.
We've more ways individual parachains can brick of course. Also JAM should bring much new brickage, but "true system parachain" could forbid non-trivial accumulation, which again maybe forbids elastic scaling.
Anyways, if collectives were kept relatively simple, than maybe collectives alone could provide this? Or maybe some simpler multi-sig derived from collectives? AH doing governance directly maybe a design mistake too, because doing so add tension between different concerns.
@bkchr I actually switched to a compact base 16 trie because the binary tree libraries were unusable in the runtime currently
Or do you just want it for the period where governance switches over to AH and we are afraid of it not working properly on AH?
Exactly for this period.
@bkchr I actually switched to a compact base 16 trie because the binary tree libraries were unusable in the runtime currently
I assume with this comment, there is no blocker to implement this, right?
It would be great to get a prototype of a pallet that tightly couples with the parachain pallets (e.g. can only work in RC), and can request to read the state of a parachain based on its latest state root: as in, have an extrinsic where anyone can provide a state proof of a parachain, and it would verify it based on the last known state root of the given para.
/// Provide the state `proof` for `id` at `block`, or the latest block if not provided
fn poc_read_para_state(id: ParaId, proof: Vec<Vec<u8>>, block: Option<BlockNumber>)
@bkchr do you know if this exists anywhere?
If this can be built, I will have no doubts that the rest of this issue can also be done.
It doesn't exist yet. However, building it should be straightforward, but it would also not support every parachain. Parachains are not required to use any specific state layout. But for the system chains we can make it work.
Some code to demonstrate:
https://github.com/paritytech/polkadot-sdk/compare/kiz-dday-demo?expand=1
We'll want parachains that never stall for PJR tests and DKGs, but they'd avoid censorship vectors like smart contracts, and never make too many blocks either, aka no elastic scaling.
In principle, relay chain governance could always take place on some non-stallable parachain, so not AssetHub, but using proofs into AssetHub state.
Some code to demonstrate:
* How to detect if AH is stalled * How to receive proofs for a given key in it
https://github.com/paritytech/polkadot-sdk/compare/kiz-dday-demo?expand=1
I guess what is missing there maybe is a double map:
balanceAtHead = head hash -> account -> balance info
then people call the frozen_balance_of
call once, and we store that balance for the frozen head.
Then we should be able to do all local operations on that Head. We will need pallets designed to do votes, but checking the correct frozen head data is used and not changed and all that.
We also probably want a way to migrate the total issuance number over for things like the voting curves, so we know when we reach certain levels of voter thresholds.
We'll want parachains that never stall for PJR tests and DKGs, but they'd avoid censorship vectors like smart contracts, and never make too many blocks either, aka no elastic scaling.
What is non-stall-able? It has no bugs + gets infinite POV limit? I am not sure if we have such a thing or can build it fast enough.
Although, if this is easier to build, I agree that we should still build the pallet I said above, but instead of RC, put it in this special parachain, and let it work on-demand: it will only start working when it detects AH is in trouble. This is more JAM-compatible. @eskimor any comments from you?
then people call the frozen_balance_of call once, and we store that balance for the frozen head.
This is only relevant if we want to do multiple voting son the same frozen AH, right? I hadn't thought of this, as I assumed the only voting will be for something that will un-block AH. It is good optimization.
We also probably want a way to migrate the total issuance number over for things like the voting curves, so we know when we reach certain levels of voter thresholds.
Indeed, it can be provided with the same mechanism quite trivially.
What is non-stall-able? It has no bugs + gets infinite POV limit? I am not sure if we have such a thing or can build it fast enough.
We don't have anything like that. However, if a separate parachain that has only the rescue pallet, the failure surface is quite small. The chain also would not really need any kind of state only for the one proposal that would need to be executed there.
Yes, stall-able is a metric, not a yes or no. At a high level, fewer features means harder to stall.
You could make an almost-impossible-to-stall PJR check chain, by replacing the parachain state root by just the score, and allowing another block that improves the score. This means a staking miner could advance the state of the PJR check chain only by knowing the relay chain state, not the previous PJR check results. This is removing the feature of having state to make the PJR check chain harder to stall. It's harder to make DKG chains similarly hard to stall, but somewhat possible
Fully utilized chains would permit partial functionality stalls, because being fully utilized means not reserving anything. Smart contracts would typically open attack vectors that partially stall chains, becuase adversaries could find tricks that consume all the resources. Elastic scaling would often permit chain takeovers by not giving other collators enough sync time. We should expact AH can be stalled more easily because AH shall have all three.
All that is why you're proposing d-day governance, but..
Fallbacks suck. Why not always do RC governance on some parachain that's harder to stall than AH? We could leave treasury on AH, because treasury stalling doesn't break anything, but do system code upgrades and parameters somewhere safer.
What is non-stall-able? It has no bugs + gets infinite POV limit? I am not sure if we have such a thing or can build it fast enough.
We don't have anything like that. However, if a separate parachain that has only the rescue pallet, the failure surface is quite small. The chain also would not really need any kind of state only for the one proposal that would need to be executed there.
I see. Let's first discuss the failure-surface. Note, the relay chain will have some code in its runtime that handles parachains (para-runtime). I assume there is in principle the possibility to also have a bug in this, in which case all parachains could stop working, no matter their code.
Putting the rescue pallet in another parachain has the benefit that it is more JAM-compatible, but it does not help with the second failure.
Putting it in the RC is not JAM-compatible, but handles both failures.
I might be paranoid by thinking the second failure is actually a feasible one. @eskimor implied in conversation off-band that I might be wrong to worry about this. In this case, having a similar rescue system in a separate on-demand parachain makes more sense. Also cc @ordian
While we could break the relay chain runtime in a way that only parachain consensus is entirely broken, I would doubt that the risk is much higher than messing up the relay chain runtime in some other way (preventing relay chain governance from working). If this happened, we would need a hardfork to fix it, just as if we messed up a relay chain upgrade right now.
Asset hub no longer making progress is disastrous enough, that we should work hard to make this as unlikely as possible.
Also purely hypothetical: If all of parachain consensus broke, then we would want to have this fixed as quickly as possible and not do some governance dance, but instead indeed likely a hard fork will be demanded by pretty much everybody. Same is likely true if asset hub breaks.
We've fixed bad upgrades before using on-chain governance, and not hardforks, although sometimes only barely, and maybe we no longer make those mistakes.
I'm assuming the RC continues running correctly, including elves/approvals and grandpa. I suppose AH might continue running correctly-ish too. Yet, we have problems backing honest AH parachain blocks, maybe because of malicious actors, or maybe unintentionally like from high or wierd usage.
In particular, we'll seemingly want AH to push a high tps for bragging rights, but this requires full AH blocks get used by transactions, meaning no reserved space for the ellection. That's problematic.
It's not bugs per se, but parachain choices that trade away resiliance for throughput and flexibility. In theory, a parachain project could always run "better" infrastructure, and that maybe how you land insane tps, but we're the L1 so their "better" might feel centralized to us.
Also..
There maybe similar robustness arguments going the opposite way, like the governance chain needing reliable infrastructure. If that's the case, then maybe a seperate d-day chain makes sense? It's unclear if AH failures could be detected though, so maybe activating the d-day chain should be the d-day chain's first act?
Anyways I worried mostly that we were going to have a fallback that barely worked, or required double the debugging time, when we should be doing it right in one place, but maybe that's not an easy choice to make right away.
Write a new governance pallet that should reside in the relay chain, while the main governance apparatus resides on Asset Hub.
The main usage of this pallet is when AH and/or Collectives are not producing blocks, and therefore can no longer access
Root
on the relay chain.The assumption of this pallet is that it can have access to the latests state root of both Collectives and AH, and also has some notion of "soft metadata" of Collectives and AH. As in, it knows that a state proof corresponding to a specific hard-coded key prefix is associated with e.g. the balance of a user in AH.
A few key properties of this pallet:
Proposals creation:
origin
sends a proof offellowshipCollective::members(who) -> rank
, then ensure that theorigin
waswho
, andrank
is high enough.pallet-referenda
.Voting
Tally
and then linked topallet-referenda
.aye/nay
).Option 1: Simple
type MinimumVotingPower
.Option 2: Meta Transaction Style
who
regarding the vote (signed(aye/nay)
), allowingorigin
to vote on behalf ofwho
.origin
if it is the first valid vote ofwho
who
changed their mind) must payorigin
proportional to the claimed voting power.origin
is slashed if invalid.Relies on https://github.com/paritytech/polkadot-sdk/pull/5400. @shawntabrizi would you like to work on this after your current work? It seems to fit your aptitude very well.
Demo branch: https://github.com/paritytech/polkadot-sdk/compare/kiz-dday-demo?expand=1