polkadot-fellows / RFCs

Proposals for change to standards administered by the Fellowship.
https://polkadot-fellows.github.io/RFCs/
Creative Commons Zero v1.0 Universal
111 stars 51 forks source link

Proposal for Minimal Relay #32

Closed joepetrowski closed 9 months ago

joepetrowski commented 10 months ago

This RFC proposes a direction and prioritization for migrating core functionality off the Relay Chain. The focus is on Identity, Staking, and Governance. The RFC includes a brief discussion on the challenges associated with each one and the most probable migration/implementation path.

xlc commented 10 months ago

I think it is best to do the simple things first: migrate identity and governance, and leave the tricky parts in a future RFC. Otherwise we won't be able to accept this RPC until all the hard questions are unanswered and it will unnecessary blocking the straightforward actions.

joepetrowski commented 10 months ago

I think it is best to do the simple things first: migrate identity and governance, and leave the tricky parts in a future RFC. Otherwise we won't be able to accept this RPC until all the hard questions are unanswered and it will unnecessary blocking the straightforward actions.

The goal of this RFC is not to answer all the "hard questions" and implementation details. It is to state a direction and objective: Identity, Staking, and Governance off the Relay Chain. Of course, for each one there will be different "hard questions" and implementation challenges. Part of the process will be figuring them out, but it's better for parachain and UI developers to know now that the architecture of this state/logic that they interact with is being rebuilt.

xlc commented 10 months ago

I guess we can signal the intention to move those features to system parachain. But without more feasibility study, we cannot say for sure we are going to move staking to a parachain. There is a good chance that it cannot happen without some major refactorings and I will say it is too early to make such decision at this stage.

joepetrowski commented 10 months ago

There is a good chance that it cannot happen without some major refactorings and I will say it is too early to make such decision at this stage.

Well this is Polkadot eating its own dog food. Yes it might (probably will) take some big refactorings. But we should not say, "it's hard to do X on a parachain but we're special so let's just put it on the Relay", and then expect others to do complicated things in parachains. If it requires refactorings, let's do them.

gavofyork commented 9 months ago

I think it is best to do the simple things first: migrate identity and governance, and leave the tricky parts in a future RFC. Otherwise we won't be able to accept this RPC until all the hard questions are unanswered and it will unnecessary blocking the straightforward actions.

The point of the RFC is to get everyone on the same page regarding the future of the Relay-chain. This is nothing new and I gave several presentations mentioning all of this in 2019.

sourabhniyogi commented 9 months ago

I think it is best to do the simple things first: migrate identity and governance, and leave the tricky parts in a future RFC. Otherwise we won't be able to accept this RPC until all the hard questions are unanswered and it will unnecessary blocking the straightforward actions.

The point of the RFC is to get everyone on the same page regarding the future of the Relay-chain. This is nothing new and I gave several presentations mentioning all of this in 2019.

Here is some data from Q3 2023 on users and weights that might help you prioritize what should get moved off the relay chain and in what order:

SELECT  call_section, count(distinct    
signer_pub_key) activeAddresses, round(sum(weight)/1000000000000, 1) weight, count(*) numCalls, count(distinct extrinsic_id) numExtrinsic FROM `bigquery-public-data.crypto_polkadot.calls0` WHERE TIMESTAMP_TRUNC(block_time, DAY) >= TIMESTAMP("2023-07-01") and TIMESTAMP_TRUNC(block_time, DAY) <= TIMESTAMP("2023-09-30")  group by call_section having call_section not in ("utility", "proxy") order by weight desc, activeAddresses desc LIMIT 100

https://docs.google.com/spreadsheets/d/1Hwt_z2_WrfrswbDpho4LxghctSsnTRbl0jhHhWPhKQk/edit#gid=1210313544

The above says very clearly: staking and balances! Everything else (identity, governance) looks like premature scaling and spinning up chains for the hell of it, to disturbing levels.

Zeroing in on those 2, there are many ways you could proceed: (A) Staking chain (B) Balances chain (AssetHub) (C) 1 staking chain, 1 balances chain (D) 2 chains both doing staking + balances (E) 4 chains: 2 doing staking and 2 doing balances

To improve XCM's async patterns, I would pick (A) or (B). This is sorely needed, and I would try to make this guy happy. Given a forced choice, pick (A) over (B) -- it has fewer and more sophisticated stakeholders To focus on substantive innovation on CoreJam/Coreplay's async innovations ALONE I would go straight to (C) To focus on substantive innovation on CoreJam/Coreplay's sync+async innovations I would go straight to (D) (E) is a hidden IQ test. Think about why.

I don't think Polkadot engineering has gone through the "heterogenous app chains => homogenous shards" conceptual change required to even choose (D). This an important prereq.

I suspect from many years of engineers here being trained on "heterogenous app chains" and specifically against "homogenous shards", a lot of you will reflexively think (C)=>(E) is a great idea, but actually (E) is a test of whether you have any regard for locality considerations, reducing unnecessary messaging, and to put it bluntly, just want to spin up new chains (governance! identity!), like, for the hell of it. Its seems pretty clear the 1.0 heuristic of

Every use case should get its own chain!

has to be discarded in favor of something more intelligent in 2.0, recognizing that

1.0 chains as a primitive were not efficient! RIP Chains!

and

in 2.0, we do more resource allocation enabled by CRJA: Actors, Work Classes, Slabs, ... in ALL use cases

The kind of solution hinted at here should be extended to the visibly tiny (in terms of weight) use cases of governance and identity -- which would NOT result in a whole governance and identity chains -- but instead address locality in sync/async composability much more deliberately in a new Actor model (Coreplay), or in Work Classes (CoreJam) and end up using resources efficiently. This is the future.

For the most pragmatic and high impact path, (A)=>(D) is the most useful path to pursue, along CRJA lines.

joepetrowski commented 9 months ago

The above says very clearly: staking and balances! Everything else (identity, governance) looks like premature scaling and spinning up chains for the hell of it, to disturbing levels.

Thanks for collecting the info. However, it only seems to look at user-dispatch transactions and not count things like on-initialize? Governance for example occasionally does some heavier things like vote tallying. So all the weight of a block doesn't necessarily come from user-interaction.

It may look like premature scaling, but like I said here, core allocation is much more agile than state/logic allocation. These subsystems can be in separate chains but still share a minimal number of cores. When they surge, core allocation can happen dynamically. And like Jeff said, we can even spin off child chains for ephemeral heavy tasks.

The RFC also mentions very little about "spinning up chains for the hell of it", and is focused on getting functionality off the Relay Chain. In fact, rather than a "new" governance chain, I'd expect to just migrate governance into Collectives. This also comes back to your data about current weight consumption and future plans: We expect many more collectives and sub-treasuries.Therefore, we want the agility of being able to move to a dedicated core if need be, but can share a core with other system chains when appropriate. Just like a staking system that can support thousands of validators, better to functionally isolate these components and do proper resource allocation with Agile Coretime than to limit the ability of one subsystem.

And to get to balances, well we need staking, governance, and identity off the Relay Chain first. Again, balances to Asset Hub, governance to Collectives. No chains "for the hell of it".

in 2.0, we do more resource allocation enabled by CRJA: Actors, Work Classes, Slabs, ... in ALL use cases

The kind of solution hinted at here should be extended to the visibly tiny (in terms of weight) use cases of governance and identity -- which would NOT result in a whole governance and identity chains -- but instead address locality in sync/async composability much more deliberately in a new Actor model (Coreplay), or in Work Classes (CoreJam) and end up using resources efficiently. This is the future.

Yup, and to make CoreJam happen, we need to take these functions off the Relay Chain.

gavofyork commented 9 months ago

Furthermore, the Relay-chain's runtime is the closest Polkadot gets to a single point of failure. We can't avoid upgrading it in general, but the simpler and more minimal we can keep it the better, especially regarding pallet code which necessarily executes with system-level privileges.

gavofyork commented 9 months ago

Agglomerating the pallets of balances, governance, identity and staking into one "blob" chain separate from Assets Hub, Bridges Hub and Collectives seems to needlessly presume that merely because they happened to begin life on the same chain (owing simply to the fact their functions were needed prior to the existence of the parachains logic) they should remain so.

Migrating production pallets between chains is a difficult and time-consuming job at the best of times. Placing them all on one chain presupposes that not a single one of them nor any yet-to-be-devised functional-companion code which does actually need to be within the same environment will ever grow beyond the needs of a single chain. This appears unfounded to me. Unless there is a clear need for pallets to live together, I would err on the side of future-proofing.

sourabhniyogi commented 9 months ago

However, it only seems to look at user-dispatch transactions and not count things like on-initialize?

You're right -- I added in an extrinsics sheet to get at the on-initialize. where beloved paraInherent totally dominates over everything else [even timestamp does]. This might inform you about just how little you might be gaining through this, using weight as a quick guide (I understand its not enough). Intuition as a total outsider is to future proof not by sticking to 1.0 heterogenous system chains but move to 2.0 homogenous shards in CRJA.

Specifically, instead of moving the pallets of balances, governance, identity and staking into one "blob" chain + AssetHub, just jump to straight to 2 => 4 => 8 => ... (or 2 => 3 => 4) homogenous shards coordinating as many use cases as you have to, and do it all at once with CRJA. This is "2.0 CRJA FTW". A good future proof solution would address mitosis-like or linear growth of how new shards get added and wouldn't have engineers doing scheduled migrations. Ideally, the migration problem would be solved for the 1=>2 case and all those following cases N => 2N or N => N+1 gracefully. All of you have been there and done that I think!

I'm not a superengineer doing substrate heroic migrations like you guys, but if I were, I'd sure like to know step 2 will work in detail (how homogenous shards ala CRJA/CP on 2 chains work) and step 3 at a high level (how more shards come online in a future-proof way) before doing step 1 (move {X, Y, ...} off the relay chain). Doing step 1 (moving OFF the relaychain) and having ANOTHER migration later looks quite short sighted! If your end goal is 1 chain do to X and 1 chain to do Y, even non-substrate engineers can see its a temporary solution with an IOU baked into the number "1".

rphmeier commented 9 months ago

Placing them all on one chain presupposes that not a single one of them nor any yet-to-be-devised functional-companion code which does actually need to be within the same environment will ever grow beyond the needs of a single chain I would err on the side of future-proofing

Starting from the practical considerations that cross-chain UX is quite poor at the moment and that locality is preferable until sharding is necessary, I would prefer to err on the side of ergonomics, not presuppose that all of these components together will require substantially more resources than a single core can provide. The ergonomics are poor.

This is both empirically true at the moment and there is little indication that it would change in the next few years.

xlc commented 9 months ago

There are overheads of a new chain in pretty much all the directions: dev, ops, UX, performance, governance, etc. We need to take considerations of all those overheads in short term and long term when deciding a new system parachain.

arrudagates commented 9 months ago

I believe a good compromise can be made here, we can meet in the middle of ergonomics and future proofing by bundling functionality that fits within a narrow category, staking + balances, identity + collectives, for example.

sourabhniyogi commented 9 months ago

I believe a good compromise can be made here, we can meet in the middle of ergonomics and future proofing by bundling functionality that fits within a narrow category, staking + balances, identity + collectives, for example.

Here is a neat vs scruffy way to compromise:

  1. Kusama be Team Scruffy, roughly pursuing UX/ergonomic + experimentation first (supporting hybrid chains, smart contracts on the relay chain) and not pursuing #32 (or any migrations) unless absolutely necessary for CoreJam/CorePlay
  2. Polkadot be "Team Neat", roughly pursuing scalability-at-all-costs (contra any UX/ergonomicconsiderations), pursuing #32 at all costs

The belief of Team Neat is that Kusama will die a horrible death at the hands of long-term scalability considerations. The belief of Team Scruffy is that Polkadot will die a horrible death at the hands of long-term UX/ergonomic considerations.

Competition between the two efforts will be done healthily, however, like within 2 branches of the same car company appealing to different market segments (Tesla S vs Tesla Model 3) who want to see the car company (Tesla) succeed and recognize their customers (developers, users) actually want different kinds of things.

Then, parachain BCC customers and CorePlay/CoreJam ICC customers of DOT+KSM CoreTime will report back on the amazing scalability of Polkadot (with more cores, more efficiency, higher security, blob chains splitting into ever more blob chains, etc.) and the amazing usability of Kusama (with fewer or maybe even NO system chains).

Wdyt?

joepetrowski commented 9 months ago

I will stay out of the "smart contracts / EVM on Relay Chain" discussion, at least in this thread. But I'm generally OK with the "Polkadot goes neat" approach. However, I want to highlight that the differences in this thread are rather small now and there is generally a lot of agreement. I think it all comes down to Identity:

So we are talking about at least one new parachain (Staking), maybe two (Identity). My opinion stands that it'd be better to functionally isolate it and just schedule it rarely. Yes it's an ergonomic cost, but having 3 system paras instead of 4 doesn't obviate the need for better tooling and UX around multi-chain applications. And if these 4 can still be run on the same core most of the time (even with Identity say once per hour), then it's still using only one core for system operations.

rphmeier commented 9 months ago

Staking is heavily used and we have plans to grow the validator set well beyond 1,000. It probably deserves its own chain

The only heavy things here are elections and rewards. If there is no decoupling of staking intentions / nominations from elections and rewards, I agree it needs its own chain. There are definitely merits to having staking intentions / nominations / reward destinations localized with other functionality.

Governance going to collectives and balances going to assethub makes sense given that these chains are already live and it adds no additional burden.

Identity being added to governance/collectives would make sense here.

However, none of these details are actually reflected in the RFC text - RFCs require sufficient detail for implementation.

joepetrowski commented 9 months ago

The only heavy things here are elections and rewards. If there is no decoupling of staking intentions / nominations from elections and rewards, I agree it needs its own chain. There are definitely merits to having staking intentions / nominations / reward destinations localized with other functionality.

Sure, there are tradeoffs to everything. For systems that consume high amounts of blockspace, it's better to be agile with Coretime scheduling so that critical functionality (especially elections and slashing) can execute when needed. Of course there are merits to having things localized, every engineering decision has tradeoffs. There are also merits to it having its own chain.

However, none of these details are actually reflected in the RFC text - RFCs require sufficient detail for implementation.

I will add more detail as a result of the discussion. But like Gav said here, this is to get everyone on the same page regarding the Relay Chain. I have a rather concrete and partially implemented plan for Identity migration (at least to its own chain), it will take some mods if this goes to the (already-running) Governance chain. @muharem is planning an RFC for the plan to move governance off the Relay Chain. And @gpestana / @Ank4n have done quite a lot of work on Staking plans, which I'm sure they can convert to an RFC once ready.

could we use some counting of 7s 6s 5s 4s and 3s in the "Process" to decide a bunch of Y/N questions on what should go into various blob chains according to some algorithm that uses rank?

Isn't that the approval referendum of an RFC / whitelisted runtime?

Shouldn't a proposal be put in front of the community for something as important as UX/ergonomics?

Yes, a runtime upgrade / new parachain proposal. And I do believe UX/ergonomics are important, when arguing for a separate chain for Identity I am not arguing against UX. My point is that the marginal cost of another chain is small. Whether we have 5 chains or 6 (AssetHub, BridgeHub, Coretime, Staking, Governance, (Identity)), we still need tools that make multi-chain application development ergonomic. Coretime scheduling should manage appropriate resource allocation because that's what it's designed to do.

xlc commented 9 months ago

My point is that the marginal cost of another chain is small.

I don't believe that's true at this stage. We want to make the cost to be small but unless someone make a proposal explain on how we can make it small in future, I won't accept it as an argument.

gavofyork commented 9 months ago

My point is that the marginal cost of another chain is small.

I don't believe that's true at this stage. We want to make the cost to be small but unless someone make a proposal explain on how we can make it small in future, I won't accept it as an argument.

Care to expand on why you believe exactly having 4 system chains rather than 3 has such a large additional cost?

xlc commented 9 months ago

None of us can just state something without provide explanations so here are some areas to check for the cost of operate a new chain:

In this specific case, the difference between 3 and 4 chains may not be big. However if we choose one feature per system parachain, then we could have like 8 of them. So we should evaluate the difference between 3 and 8 to help guide the future decisions.

I will say there are lot more work required to allow multichain dApps to scale. We should also ask wallet teams and they will have a good idea on the cost/overhead to support one more chain to their wallet.

joepetrowski commented 9 months ago

Initial development time. That including copy template, ensure deps are aligned with other chains, update the name and configurations, generate genesis, testing to ensure it actually works. Maybe someone can check the time used to setup the collective parachain?

I already have a branch largely configured, so close to zero. Anyway, the initial configuration takes a few hours. For testing, we have infrastructure ready to get this started on Rococo and Westend.

Maintenance time. This is relatively small as it is mostly duplicating the changes.

Yup, very little.

Integration tests & e2e tests & manual tests. Not terribly bad but does require more work. Also keep in mind the number of XCM tests can be O(n^2) as we need to ensure every system parachains can interact with each other via XCM. At least for the one with balances pallet.

Testing is not a resource-constrained environment. Testing n^2 cases is not a big deal, still takes less time than running benchmarks. And they are all automated.

Audit

Very little, it's a pretty standard system parachain configuration.

Infra. RPC nodes, Collator nodes, monitoring etc. We can't share the nodes (yet) and they are not cheap to run.

Collator nodes are not expensive. I think the Infra bounty for them is something like 100-200 USD / month / collator.

Governance runtime upgrade overhead. The overhead can be reduced with batch call but still, we don't yet have the ability to run some script and just tell me the upgrade runtime proposal is good.

Yes we do. This is easy. Few lines of code in the script. Almost zero overhead.

dApps overhead. One more chain to connect. More set of RPC nodes to deal with OR one more chain spec in light client. The chain spec files are big because they include wasm and use hex format for it. Add 1mb payload to a web app is always bad UX.

The goal of Polkadot (and the reason for shared security) was precisely that applications could span multiple chains. Apps need a solution for this and it has nothing to do with system chains.

Devs overhead. We simply don't have a good library to deal with multiple APIs. This overhead can be considered none once we have it but we don't have it.

I've mentioned this several times on the forum and I think there's a W3F Grants RFP for it (@Noc2 ?). Like the point above, yes it's needed but it's needed in general, for all 50+ paras on Polkadot, and really has not much to do with a single system chain. Hopefully this is more incentive to make it.

joepetrowski commented 9 months ago

/rfc propose

github-actions[bot] commented 9 months ago

Hey @joepetrowski, here is a link you can use to create the referendum aiming to approve this RFC number 0032.

Instructions 1. Open the [link](https://polkadot.js.org/apps/?rpc=wss%3A%2F%2Fpolkadot-collectives-rpc.polkadot.io#/extrinsics/decode/0x3d003e01015901000049015246435f415050524f564528303033322c31373933366562363966333039356433373261666163653166653262363737643239656132363830343739343336313535386363636131626565376231656235290100000000). 2. Switch to the `Submission` tab. 3. Adjust the transaction if needed (for example, the proposal Origin). 4. Submit the Transaction

It is based on commit hash 87ab0f1be5bafd404e7a1e6f465db027b5271ecd.

The proposed remark text is: RFC_APPROVE(0032,17936eb69f3095d372aface1fe2b677d29ea26804794361558ccca1bee7b1eb5).

joepetrowski commented 9 months ago

https://collectives.polkassembly.io/member-referenda/28

sourabhniyogi commented 9 months ago

@joepetrowski @gavofyork I see that you added in a new section on Kusama, to practice migration. I was hoping Kusama could be a place for CoreJam + Coreplay experimentation with fewer system chains, a place for Team Scruffy, but its more like Team Neat needs to abort Team Scruffy at birth! I think your coercing #32 into Kusama will deal a blow to Polkadot/Kusama ecosystem, or at least damn Kusama to be nothing more than a "testnet" -- its too big of a bet on the ergonomics / UX all coming together too quickly.

How can we have separate #32 referendums for Polkadot (for scalability) and for Kusama (for usability)? I'd like to have a situation where Kusama can live to be MORE than a testnet, where developers have a nice "simple" place to work with the new 2.0 CoreJam + Coreplay programming model, including Availability patterns.

If having Kusama be more than a testnet is impossible (because Rococo + Chopsticks "practice" migration is insufficient), I would like to see a THIRD production network, say the "CoreTime network", a production network (relay chain) dedicated to improving 2.0 CoreJam + Coreplay usability for new devs with lower complexity than its Big Brother Polkadot.

Can you chart a course?

joepetrowski commented 9 months ago

The section on Kusama was added insofar as to state how it can be useful in its capacity as canary net in achieving the goals set out in this RFC. Adding a vision and roadmap for Kusama development is way out of scope for this RFC.

When CoreJam comes, and it's probably approximately a year to have it in a "Kusama ready" state, I'm sure Kusama will again play a leading role on the frontier.

shawntabrizi commented 9 months ago

I will just speak my thoughts here, hoping to come from a constructive place.

I spiritually agree with this proposal, and I am inclined just to vote AYE.

At the same time, I certainly know that a single document with < 300 lines cannot capture the underlying complexity, challenges, and decisions that will need to be made once this is actually underway.

So I kind of want to understand what we are voting for here. If the vote is to establish direction, then I am a strong AYE.

However, I guess I would also expect that different parts of the actual migration process will also manifest as RFCs to be approved and so on. Perhaps one system chain at a time, where even chains like Staking may take multiple RFCs and upgrades to actually get to a final envisioned state.

Is that an accurate picture of what this vote is about, or is there some leniency we are supposed to give on how exactly this is all accomplished?

burdges commented 9 months ago

I'd think the vote says "kusama should eventually do this unless we hit real obstructions". If we later discover it sucks on kusama despite our best efforts, then yeah we might do something slightly different, although more likely we tweak functionality.

The vote should not say "we're going to push this through quickly". In particular, we want whatever pattern of migrations happens here, like parachain forking, to be useful for customer parachains when they want to buy multiple slots, without using elastic scaling. We should probably not do this until we can explain what we're doing to customer parachain teams.

joepetrowski commented 9 months ago

@shawntabrizi yes this is to establish direction. Regarding each subsystem:

I don't think it's entirely clear yet what should be an RFC and what not, though. Of course, changes to the core protocol should be in RFC format. A step-by-step migration plan for Identity? I'm not sure an RFC is more valuable than a detailed GitHub tracking issue.

sourabhniyogi commented 9 months ago

The section on Kusama was added insofar as to state how it can be useful in its capacity as canary net in achieving the goals set out in this RFC. Adding a vision and roadmap for Kusama development is way out of scope for this RFC.

When CoreJam comes, and it's probably approximately a year to have it in a "Kusama ready" state, I'm sure Kusama will again play a leading role on the frontier.

@joepetrowski Got it! I can see the level of depth of thought given to the ergonomic/UX considerations voiced by @rphmeier and engineering concerns raised by @xlc, my own RFC #33. I closed #33 in favor of this stand in of a roadmap -- there is no path for Kusama worth developing. Thank you for setting up the foundations so clearly.

xlc commented 9 months ago

I share similar feeling with @shawntabrizi and that's why I haven't voted.

I can see this established a general direction for relaychain and system parachains work, which is indeed useful to have and I agree with this. However, there are not enough technical details to follow so we need additional RFCs to discuss the exact actions. And we are likely to not follow this RFC in case some unexpected obstacle were discovered in future.

Quoting myself from a comment from another proposal

https://github.com/polkadot-fellows/RFCs/pull/35#pullrequestreview-1670004354

I see this is more of a feature request instead of RFC. There are no enough technical details for reviewers to evaluate the full impact of this proposal and for developers to implement if accepted.

and Gav's one

I'll underline @xlc 's remarks. As it stands, this is just a wishlist. For it to be taken seriously it must include implementation specifics, including how to achieve bounded-complexity compute and state-changes on all operations.

So my real question is, how much technical detail is required for an RFC? Are we ok with high level proposal like this one or something more generic, like we should optimize the performance of relaychain by try to do X, Y and Z.

shawntabrizi commented 9 months ago

Perhaps we should have different kinds of RFCs like: [Direction], [Feature Request], [Implementation].

In any case, since this is a "direction" based RFC, and I agree with the direction, I will vote AYE.

joepetrowski commented 9 months ago

/rfc process

github-actions[bot] commented 9 months ago

Please provider a block hash where the referendum confirmation event is to be found. For example:

/rfc process 0x39fbc57d047c71f553aa42824599a7686aea5c9aab4111f6b836d35d3d058162
Instructions to find the block hashHere is one way to find the corresponding block hash. 1. Open the referendum on Subsquare. 2. Switch to the `Timeline` tab. --- 3. Go to the details of the `Confirmed` event. --- 2. Go to the details of the block containing that event. --- 2. Here you can find the block hash.
joepetrowski commented 9 months ago

/rfc process 0xa29d08c3030bf527de0fadef3f658ea1e1d3200c4a7530038d693cbe97cab7f3

github-actions[bot] commented 9 months ago

The on-chain referendum has approved the RFC.