D-Day Governance - Githubissues

kianenigma commented 2 months ago

Write a new governance pallet that should reside in the relay chain, while the main governance apparatus resides on Asset Hub.

The main usage of this pallet is when AH and/or Collectives are not producing blocks, and therefore can no longer access Root on the relay chain.

The assumption of this pallet is that it can have access to the latests state root of both Collectives and AH, and also has some notion of "soft metadata" of Collectives and AH. As in, it knows that a state proof corresponding to a specific hard-coded key prefix is associated with e.g. the balance of a user in AH.

A few key properties of this pallet:

Proposals creation:

Either of:
- the Collectives parachain (fellowship) can always create one
- membership proof in the fellowship (should collectives be offline)
  - signed origin sends a proof of fellowshipCollective::members(who) -> rank, then ensure that the origin was who, and rank is high enough.
  - Fellowship members should retain some DOT in RC for this.
AND, the pallet knows that AH has not produced blocks for some period of time.
Should AH/collective resume producing blocks, the pallet should ignore any ongoing proposals.
- Any half-finished voting/proposal data should be removed lazily.
- This entails the importance of not being able to "trick" the pallet to think that AH is stalled.
Should be implemented as an instance of pallet-referenda.

Voting

Should be implemented as a new pallet, implanting Tally and then linked to pallet-referenda.
Voters don't have balance on RC, but we can assume that some "power users", such as foundations, teams and fellowship do.
Voting can be done through submitting a proof of one's balance on AH, and a preference (aye/nay).
Voting info stored in a child-tree per referenda, to enable a better lazy cleanup.
Concerns for the RC:
- Not get flooded by (free) transactions.
- first process high-value-bearing votes, and then the smaller ones.

Option 1: Simple

Per referenda, each user gets 1 (fully) free immutable vote.
One simple de-sybil mitigation might be to introduce a type MinimumVotingPower.
Prioritizing votes in this case is hard; the only way to do it is to add logic to the transaction pool validation step, which will require at least reading one storage item.

Option 2: Meta Transaction Style

Allow any signed origin to hand over a signed statement from who regarding the vote (signed(aye/nay)), allowing origin to vote on behalf of who.
- transaction should still be free for origin if it is the first valid vote of who
- subsequent votes (who changed their mind) must pay
- providing invalid proof will lead to slash of a deposit from origin proportional to the claimed voting power.
This allows voting to happen through funded "power users" as a proxy, which we can assume to have balance.
Consequently, we can implement prioritization with less risk; transactions will be sorted based on "claimed voting power", and origin is slashed if invalid.
- slash amount proportional to voting power.
- to submit a vote on behalf of a whale, you need more free balance on RC to pay for the slash deposit.
Might lead to censorship, but not a feasible issue as long as one honest actor is willing to vote on your behalf.

Relies on https://github.com/paritytech/polkadot-sdk/pull/5400. @shawntabrizi would you like to work on this after your current work? It seems to fit your aptitude very well.

Demo branch: https://github.com/paritytech/polkadot-sdk/compare/kiz-dday-demo?expand=1

shawntabrizi commented 2 months ago

Acknowledging the issue.

Not sure how much availability I have, but I can def mentor someone. Depends on urgency. If not urgent, I could probably get small pieces of this story done over the weeks.

kianenigma commented 2 months ago

This falls into the more long term requirements of Asset Hub, not being needed until the very final days. In that sense, I was going to suggest you start working on it after your current project is done and roughly by end of DevCon?

bkchr commented 2 months ago

Relies on #5400. @shawntabrizi would you like to work on this after your current work? It seems to fit your aptitude very well.

This is using a binary merkle tree and the chain is using a 16 patricia merkle tree. They are not compatible. We already have other code in historical session that does the checking of proofs already.

Generally, with the development of JAM, we will not have this luxury of having an extra governance sitting on the relay chain. So, when in JAM all chains stop, we don't have governance as well. So, a little bit questionable if we need this pallet at all. Or do you just want it for the period where governance switches over to AH and we are afraid of it not working properly on AH?

burdges commented 2 months ago

Afaik XCMP need real state proofs into other parachain's state, so one could abstract that somewhat.

We'll avoid starving "true system parachains" ala https://github.com/paritytech/polkadot-sdk/issues/4632#issuecomment-2209188695. We've not concretely defined that term yet, maybe audited like polkadot itself and no flexible execution aka no smart contracts. Also maybe no advanced collator communication, which maybe forbids elastic scaling. We've discussed reverting code upgrades automagically too, but afaik nothing currently in progress, and maybe imposes design restrictions.

We've more ways individual parachains can brick of course. Also JAM should bring much new brickage, but "true system parachain" could forbid non-trivial accumulation, which again maybe forbids elastic scaling.

Anyways, if collectives were kept relatively simple, than maybe collectives alone could provide this? Or maybe some simpler multi-sig derived from collectives? AH doing governance directly maybe a design mistake too, because doing so add tension between different concerns.

shawntabrizi commented 2 months ago

@bkchr I actually switched to a compact base 16 trie because the binary tree libraries were unusable in the runtime currently

kianenigma commented 2 months ago

Or do you just want it for the period where governance switches over to AH and we are afraid of it not working properly on AH?

Exactly for this period.

kianenigma commented 2 months ago

@bkchr I actually switched to a compact base 16 trie because the binary tree libraries were unusable in the runtime currently

I assume with this comment, there is no blocker to implement this, right?

It would be great to get a prototype of a pallet that tightly couples with the parachain pallets (e.g. can only work in RC), and can request to read the state of a parachain based on its latest state root: as in, have an extrinsic where anyone can provide a state proof of a parachain, and it would verify it based on the last known state root of the given para.

/// Provide the state `proof` for `id` at `block`, or the latest block if not provided
fn poc_read_para_state(id: ParaId, proof: Vec<Vec<u8>>, block: Option<BlockNumber>)

@bkchr do you know if this exists anywhere?

If this can be built, I will have no doubts that the rest of this issue can also be done.

bkchr commented 2 months ago

It doesn't exist yet. However, building it should be straightforward, but it would also not support every parachain. Parachains are not required to use any specific state layout. But for the system chains we can make it work.

kianenigma commented 1 month ago

Some code to demonstrate:

How to detect if AH is stalled
How to receive proofs for a given key in it

https://github.com/paritytech/polkadot-sdk/compare/kiz-dday-demo?expand=1

burdges commented 1 month ago

We'll want parachains that never stall for PJR tests and DKGs, but they'd avoid censorship vectors like smart contracts, and never make too many blocks either, aka no elastic scaling.

In principle, relay chain governance could always take place on some non-stallable parachain, so not AssetHub, but using proofs into AssetHub state.

shawntabrizi commented 3 weeks ago

Some code to demonstrate:
* How to detect if AH is stalled

* How to receive proofs for a given key in it
https://github.com/paritytech/polkadot-sdk/compare/kiz-dday-demo?expand=1

I guess what is missing there maybe is a double map:

balanceAtHead = head hash -> account -> balance info

then people call the frozen_balance_of call once, and we store that balance for the frozen head.

Then we should be able to do all local operations on that Head. We will need pallets designed to do votes, but checking the correct frozen head data is used and not changed and all that.

We also probably want a way to migrate the total issuance number over for things like the voting curves, so we know when we reach certain levels of voter thresholds.

kianenigma commented 3 weeks ago

We'll want parachains that never stall for PJR tests and DKGs, but they'd avoid censorship vectors like smart contracts, and never make too many blocks either, aka no elastic scaling.

What is non-stall-able? It has no bugs + gets infinite POV limit? I am not sure if we have such a thing or can build it fast enough.

Although, if this is easier to build, I agree that we should still build the pallet I said above, but instead of RC, put it in this special parachain, and let it work on-demand: it will only start working when it detects AH is in trouble. This is more JAM-compatible. @eskimor any comments from you?

kianenigma commented 3 weeks ago

then people call the frozen_balance_of call once, and we store that balance for the frozen head.

This is only relevant if we want to do multiple voting son the same frozen AH, right? I hadn't thought of this, as I assumed the only voting will be for something that will un-block AH. It is good optimization.

We also probably want a way to migrate the total issuance number over for things like the voting curves, so we know when we reach certain levels of voter thresholds.

Indeed, it can be provided with the same mechanism quite trivially.

bkchr commented 2 weeks ago

What is non-stall-able? It has no bugs + gets infinite POV limit? I am not sure if we have such a thing or can build it fast enough.

We don't have anything like that. However, if a separate parachain that has only the rescue pallet, the failure surface is quite small. The chain also would not really need any kind of state only for the one proposal that would need to be executed there.

burdges commented 2 weeks ago

Yes, stall-able is a metric, not a yes or no. At a high level, fewer features means harder to stall.

You could make an almost-impossible-to-stall PJR check chain, by replacing the parachain state root by just the score, and allowing another block that improves the score. This means a staking miner could advance the state of the PJR check chain only by knowing the relay chain state, not the previous PJR check results. This is removing the feature of having state to make the PJR check chain harder to stall. It's harder to make DKG chains similarly hard to stall, but somewhat possible

Fully utilized chains would permit partial functionality stalls, because being fully utilized means not reserving anything. Smart contracts would typically open attack vectors that partially stall chains, becuase adversaries could find tricks that consume all the resources. Elastic scaling would often permit chain takeovers by not giving other collators enough sync time. We should expact AH can be stalled more easily because AH shall have all three.

All that is why you're proposing d-day governance, but..

Fallbacks suck. Why not always do RC governance on some parachain that's harder to stall than AH? We could leave treasury on AH, because treasury stalling doesn't break anything, but do system code upgrades and parameters somewhere safer.

kianenigma commented 2 weeks ago

What is non-stall-able? It has no bugs + gets infinite POV limit? I am not sure if we have such a thing or can build it fast enough.

We don't have anything like that. However, if a separate parachain that has only the rescue pallet, the failure surface is quite small. The chain also would not really need any kind of state only for the one proposal that would need to be executed there.

I see. Let's first discuss the failure-surface. Note, the relay chain will have some code in its runtime that handles parachains (para-runtime). I assume there is in principle the possibility to also have a bug in this, in which case all parachains could stop working, no matter their code.

AH itself is buggy, but the relay chain is fine.
para-runtime is buggy.

Putting the rescue pallet in another parachain has the benefit that it is more JAM-compatible, but it does not help with the second failure.

Putting it in the RC is not JAM-compatible, but handles both failures.

I might be paranoid by thinking the second failure is actually a feasible one. @eskimor implied in conversation off-band that I might be wrong to worry about this. In this case, having a similar rescue system in a separate on-demand parachain makes more sense. Also cc @ordian

eskimor commented 2 weeks ago

While we could break the relay chain runtime in a way that only parachain consensus is entirely broken, I would doubt that the risk is much higher than messing up the relay chain runtime in some other way (preventing relay chain governance from working). If this happened, we would need a hardfork to fix it, just as if we messed up a relay chain upgrade right now.

Asset hub no longer making progress is disastrous enough, that we should work hard to make this as unlikely as possible.

Also purely hypothetical: If all of parachain consensus broke, then we would want to have this fixed as quickly as possible and not do some governance dance, but instead indeed likely a hard fork will be demanded by pretty much everybody. Same is likely true if asset hub breaks.

burdges commented 2 weeks ago

We've fixed bad upgrades before using on-chain governance, and not hardforks, although sometimes only barely, and maybe we no longer make those mistakes.

I'm assuming the RC continues running correctly, including elves/approvals and grandpa. I suppose AH might continue running correctly-ish too. Yet, we have problems backing honest AH parachain blocks, maybe because of malicious actors, or maybe unintentionally like from high or wierd usage.

In particular, we'll seemingly want AH to push a high tps for bragging rights, but this requires full AH blocks get used by transactions, meaning no reserved space for the ellection. That's problematic.

It's not bugs per se, but parachain choices that trade away resiliance for throughput and flexibility. In theory, a parachain project could always run "better" infrastructure, and that maybe how you land insane tps, but we're the L1 so their "better" might feel centralized to us.

Also..

There maybe similar robustness arguments going the opposite way, like the governance chain needing reliable infrastructure. If that's the case, then maybe a seperate d-day chain makes sense? It's unclear if AH failures could be detected though, so maybe activating the d-day chain should be the d-day chain's first act?

Anyways I worried mostly that we were going to have a fallback that barely worked, or required double the debugging time, when we should be doing it right in one place, but maybe that's not an easy choice to make right away.

paritytech / polkadot-sdk

D-Day Governance #5588

Proposals creation:

Voting

Option 1: Simple

Option 2: Meta Transaction Style