`GenesisConfig` in a native runtime free world

bkchr commented 1 year ago

The GenesisConfig is currently one of the only things for that we still require the native runtime. It isn't compiled for wasm right now. We also require the native runtime to load a GenesisConfig from a JSON file and as the GenesisConfig can change with newer native runtimes we got the raw chain spec that only contains the key/value storage pairs of the genesis block. When we remove the native runtime we need a new way to handle the GenesisConfig.

I propose that we put the GenesisConfig into the genesis wasm. This means that we start compiling the GenesisConfig for wasm, but I would put this behind some rust feature. So, the normal production builds should not include the GenesisConfig anymore. This is done to reduce the size of the wasm binary. I would also keep supporting the serialization of the GenesisConfig into JSON. If the GenesisConfig is part of the genesis wasm, we have some nice advantages. The first one being that we could keep the JSON serialized GenesisConfig in the chain spec and even distribute this chain spec as we would call into the genesis wasm to parse and generate the genesis block. This would require that we move out the genesis wasm from the actual genesis field in the chain spec to some dedicated field from where we can load the wasm binary. Another advantage would be that we could finally make the chain-spec-builder work properly. Currently this binary isn't really working for chain specs that aren't build from the kitchensink-runtime. The chain-spec-builder could be passed some crate name that belongs to some runtime and then it would generate the chain spec for this runtime. We could also enable certain features in the runtime etc.

To build what I outlined above, we would need the following:

Make GensisConfig build in wasm. This will require supporting serde in the wasm build to serialize/deserialize the GenesisConfig to/from JSON.
Add a new runtime api GenesisBuilder (better name should be found) that can be used to build the genesis config, load it etc.
Change the chain spec to have a dedicated field for the code.
Conditionally compile GenesisConfig into the wasm binary. By default the GenesisConfig should probably be compiled into the wasm binary. When doing a on-chain build there should exist a feature to disable the GenesisConfig.

Part of: https://github.com/paritytech/polkadot-sdk/issues/62

michalkucharczyk commented 1 year ago

Required support for serde addressed in https://github.com/paritytech/substrate/pull/13027

arkpar commented 1 year ago

Make GensisConfig build in wasm. This will require supporting serde in the wasm build to serialize/deserialize the GenesisConfig to/from JSON.

Is anyone really using runtime-formatted JSON specs? All we really need is raw spec which does not require serde.

ggwpez commented 1 year ago

Is anyone really using runtime-formatted JSON specs? All we really need is raw spec which does not require serde.

I think zombienet relies on meddling with the JSON chain spec and other testing tools probably as well.
Also it is very convenient for debugging…

bkchr commented 1 year ago

Yeah and this will open a lot of possibilities. We will be able to ship json chain specs without them breaking, as the wasm file that loads them stays the same. We will be able to write tools that can modify chain specs very easily etc. It improves the user experience quite a lot.

xlc commented 1 year ago

I see it is a nice to have but not essential. We can always load metadata from wasm and use it to convert between the raw values and JSON. polkadot.js makes this relatively easy.

If this feature doesn't need to be part of Substrate, maybe it doesn't need to be part of Substrate?

bkchr commented 1 year ago

I see it is a nice to have but not essential. We can always load metadata from wasm and use it to convert between the raw values and JSON. polkadot.js makes this relatively easy.

Good point, if that works good enough we don't need serde/json support.

xlc commented 1 year ago

Chopsticks already implemented this feature and it works well.

Custom storage override: https://github.com/AcalaNetwork/chopsticks/blob/master/configs/kusama.yml

Code that convert yaml/json to hex: https://github.com/AcalaNetwork/chopsticks/blob/master/packages/chopsticks/src/utils/set-storage.ts

bkchr commented 1 year ago

Yeah to hex is easy, the other way around is more the problem. GenesisConfig fields are also not directly the storage items and can be from a different layout and then use custom logic to fill the storage. All in all still something that sounds useful to me. Especially as it is really just much easier for people to interact with it.

We would also then "disable" the GenesisConfig for builds that are going on chain as runtime upgrade to remove serde etc. So, there isn't really any downside.

michalkucharczyk commented 1 year ago

edited: question moved to: https://github.com/paritytech/substrate/pull/14131/files#r1192701289

Polkadot-Forum commented 1 year ago

This issue has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/genesisconfig-in-wasm-runtime-a-step-towards-a-non-native-future/2887/1

xlc commented 1 year ago

Conditionally compile GenesisConfig into the wasm binary. By default the GenesisConfig should probably be compiled into the wasm binary. When doing a on-chain build there should exist a feature to disable the GenesisConfig

Without this, I don't really want to enable Serialize/Deserialize.

This is a big regression. We only enable Serialize/Deserialize in std for good reasons and now we are forced to have then included wasm.

bkchr commented 1 year ago

But the linker will remove it when not used, so serialize/deserialize will not be part of the wasm. But yes, you need it at compile time.

xlc commented 1 year ago

But the linker will remove it when not used

It is used if the wasm is able to build genesis? So without a feature flag to disable the ability to build genesis, it will be included on the onchain wasm.

AurevoirXavier commented 1 year ago

Our parachain v2 is built upon the previous solo chain and parachain. Its genesis state includes all the previous states. Will this make the genesis WASM hundreds of megabytes?

xlc commented 1 year ago

@AurevoirXavier the code to generate genesis is in wasm but the genesis state is not in wasm.

bkchr commented 1 year ago

But the linker will remove it when not used

It is used if the wasm is able to build genesis? So without a feature flag to disable the ability to build genesis, it will be included on the onchain wasm.

Yes, but this already explained in the issue that the GenesisConfig needs to be build conditionally. There needs to be a guide, but this is something that downstream needs to do on its own as we can not control the feature in your runtime. I think up to this day no one besides us uses a on-chain-build feature to disable logging for example.

michalkucharczyk commented 1 year ago

Support for genesis building in runtime can be implemented as follows (see this PR https://github.com/paritytech/substrate/pull/14310 for full code):

use frame_support::genesis_builder_helper::{build_config, create_default_config};
sp_api::impl_runtime_apis! {
        ...
        #[cfg(feature = "genesis-builder")]
        impl sp_genesis_builder::GenesisBuilder<Block> for Runtime {
                fn create_default_config() -> Vec<u8> {
                        create_default_config::<RuntimeGenesisConfig>()
                }

                fn build_config(config: Vec<u8>) -> sp_genesis_builder::Result {
                        build_config::<RuntimeGenesisConfig>(config)
                }
        }

The implementation that actually uses all serde staff can be hidden behind dedicated feature.

When the feature is enabled depends on use-case. As Basti said, for on-chain builds the feature can be disabled and the resulting wasm blob will not contain any serialization code.

Having this approach implemented for polkadot runtime (PR is under construction), the numbers are as follows:

release profile:

type	size	num of funcs
with genesis-builder	8530347	906
no genesis-builder	7868841	0

production profile:

type	size	num of funcs
with genesis-builder	6436520	519
no genesis-builder	5896270	0

where num of funcs is the number of *serde* symbols in wasm binary: wasm-objdump -x polkadot_runtime.wasm | grep -e "- func.*serde" | wc -l

Just a way to check if serde was actually optimized by LTO.

As you can see all the serde-related symbols were stripped away.

kianenigma commented 1 year ago

Once this whole thing is done, please consider reflecting your changes to https://docs.substrate.io/build/genesis-configuration/ cc @wentelteefje @Rashmirreddy

bkchr commented 1 year ago

Yes I already talked with @michalkucharczyk to help updating the guides after these changes are merged!

Polkadot-Forum commented 1 year ago

This issue has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/polkadot-release-analysis-v1-2-0/4451/1

JuaniRios commented 10 months ago

@bkchr I need the native runtime for the GenesisBuilder because I'm using async rust + many other features behind std to instantiate my pallet since it's so complex.

bkchr commented 10 months ago

@bkchr I need the native runtime for the GenesisBuilder because I'm using async rust + many other features behind std to instantiate my pallet since it's so complex.

Can you give an example on what exactly you are doing there? Especially as GenesisBuilder doesn't support async.

JuaniRios commented 10 months ago

@bkchr I'm using the current_thread runtime of Tokio to instantiate several tasks inside the GenesisBuilder.

This is because my pallet creates projects that go through several stages as time (measured in blocks) moves forward.

Now to test the UI, we need to start a local parachain with one project in each stage, and for this, every time I go forward in time, I need to see which project needs to resume execution.

Normal way we create projects in mock: Project A execution (@ State 1) ---move blocks--> Project A execution (state 2) ---move blocks--> Project A execution(state 3)

The previous example is run through a function, such that in testing if we want to create a project at state 3 and then do some tests, we call new_state_3_project().

Each state has a certain time it can remain alive. After that, it moves into either the success or failed state.

If we use the normal synchronous way of instantiating our test projects, then by the time we advance blocks to create a new project in another state, the previous instantiated project states become invalid.

The current way which is already working, is by creating an async project creation function, which blocks execution whenever it wants to advance blocks. Before blocking though, it notifies an orchestrator at which block it wants to be awakened at.

The orchestrator then starts advancing blocks until it reaches the lowest block required by one of the instantiating project functions and hands over execution to it.

A simple way for one test would be to hardcode each transition of each project instead of relying on calling functions and expecting them to hand over execution between themselves. But then I wouldn't be able to pass in any info at Genesis to start any number of projects at any state.

If this is too confusing, I can give you access to our repo at Polimec and you can look through the code yourself. In any case just by spawning Tokio tasks inside GenesisBuilder, then we couldn't use the wasm runtime no?

I couldn't get the Genesis builder working using the multithreaded Tokio runtime since each task wasn't executing under the context of the substrate storage. But using the single-threaded runtime did the trick.

JuaniRios commented 9 months ago

@bkchr We just open-sourced our code, here is where we are using async in the builder: https://github.com/Polimec/polimec-node/blob/da9d1ee0062ead7a62f815647813ada48e4c2250/pallets/funding/src/lib.rs#L1386

As mentioned before, we're using the single-threaded Tokio runtime and it works just fine.

bkchr commented 9 months ago

@JuaniRios why don't you run this async code in the node and then pass the processed data into the runtime?

JuaniRios commented 9 months ago

@bkchr the purpose of the async code is to create genesis storage to use in testing. Since this is the main use case of the genesis builder, I believe it makes sense to be inside the runtime code.

How would you see us achieving the same thing with the code moved to the node?

Also I heard there that soon we would move to an "omninode", so we would stop being able to put our own code into the substrate node, and just use a generic one imported as a crate

michalkucharczyk commented 9 months ago

If this is only for testing - maybe you could use TestExternalities to build a genesis storage? You could call your build function from there.

Once you have a genesis storage it should be easy to convert it to runtime GensisConfig / chainspec.

JuaniRios commented 9 months ago

@michalkucharczyk thanks for the reply!

So basically do what the genesis builder is already doing, but hardcoding the logic in the node instead of calling the code from the runtime?

What would be the advantage of this?

If polkadot-sdk removes the native runtime, then calling TestExternalities on the node will still make it impossible to use std no? Or will TestExternalities always use std?

EDIT: Is it bad to have code used for testing inside the runtime? From what I've seen, the GenesisBuilder is mostly used in tests, since most parachains will launch with a minimal runtime and add pallets through runtime upgrades, which makes the genesis configs rather useless

JuaniRios commented 9 months ago

Stupid question, could we use async to run multiple extrinsics simultaneously if they don't read/write from the same storage? The async would let us run another extrinsic's code while the storage reads/writes of another are blocking.

Would something like this need to be programmed in the runtime itself, or could the node know each extrinsic which storage items it will touch and so know which other extrinsics it can batch concurrently?

bkchr commented 9 months ago

Also I heard there that soon we would move to an "omninode", so we would stop being able to put our own code into the substrate node, and just use a generic one imported as a crate

There will probably be such a node, but no one forces you to use this. This is more to make it easier to create parachains. However, with the omninode you would not have the native runtime either and so your "hack" wouldn't work.

I honestly still don't get why you need to use async there and what the code is doing. I also don't want to dig into this. As I already told you, you can move this code somewhere else and then add whatever data this async code is generating to your GenesisConfig and then in the runtime put it into the genesis storage. Your you do what @michalkucharczyk proposed with the TestExternalities.

Stupid question, could we use async to run multiple extrinsics simultaneously if they don't read/write from the same storage? The async would let us run another extrinsic's code while the storage reads/writes of another are blocking.

This is a complete different topic and we don't support this right now. I know that there exists some frameworks supporting "parallel" execution of transactions. However, it is questionable if you really need this right now.

JuaniRios commented 9 months ago

This is a complete different topic and we don't support this right now. I know that there exists some frameworks supporting "parallel" execution of transactions. However, it is questionable if you really need this right now.

Yea I don't need it right now, but thought it might be an interesting idea for future optimizations

michalkucharczyk commented 9 months ago

If polkadot-sdk removes the native runtime, then calling TestExternalities on the node will still make it impossible to use std no? Or will TestExternalities always use std?

Removal of native runtime in node means that node is no longer coupled with runtime and is able to do all the work with just wasm runtime blob. No types or native calls to runtime are hard-coded in node. It also means that generic node has no deps on runtime crate. All node requires is a wasm blob.

But your runtime is still just a crate, which maybe used in whatever-you-want environment (e.g. wasm runtime or tests). You can write tools or tests (or even your flavor of node) that are using your runtime crate.

Hope this addresses your doubts.

JuaniRios commented 9 months ago

@michalkucharczyk thanks for your detailed answer :)

Last question, will the "std" feature config be removed from the TestExternalities type as part of the native-runtime removal?

michalkucharczyk commented 9 months ago

Last question, will the "std" feature config be removed from the TestExternalities type as part of the native-runtime removal?

I don't think so, however I am not an authority in this area :)

I looked into code again, you perhaps don't even need to use TestExternatlities. Your RuntimeGenesisConfig implements sp_runtime::BuildStorage trait, so you can simply get the genesis storage by calling build_storage method. Start from here

JuaniRios commented 9 months ago

I thought part of the move to native-runtime free was to remove "std" from the macro expansion of the GenesisBuilder struct of each pallet. That is the main pain point which started the discussion, because if std is not passed into that struct by the macro, then I cannot use tokio.

This is how the current expansion looks:

    #[cfg(feature = "std")]
    impl<T: Config> frame_support::sp_runtime::BuildStorage for GenesisConfig<T> where
        T: Config + pallet_balances::Config<Balance=BalanceOf<T>>,
        <T as Config>::AllPalletsWithoutSystem:
        OnFinalize<BlockNumberFor<T>> + OnIdle<BlockNumberFor<T>> + OnInitialize<BlockNumberFor<T>>,
        <T as Config>::RuntimeEvent: From<Event<T>> + TryInto<Event<T>> + Parameter + Member,
        <T as pallet_balances::Config>::Balance: Into<BalanceOf<T>>,
    {
        fn assimilate_storage(&self, storage: &mut sp_runtime::Storage) -> std::result::Result<(), std::string::String> {
            frame_support::BasicExternalities::execute_with_storage(storage, || {
                self.build();
                Ok(())
            })
        }
    }

And also the RuntimeGenesisConfig generated by the construct_runtime! macro passes "std" to it, which will be removed after the native-runtime removal:

#[cfg(any(feature = "std", test))]
impl self::sp_api_hidden_includes_construct_runtime::hidden_include::sp_runtime::BuildStorage for RuntimeGenesisConfig {
    fn assimilate_storage(&self, storage: &mut self::sp_api_hidden_includes_construct_runtime::hidden_include::sp_runtime::Storage ) -> std::result::Result<(), String> {
        self::sp_api_hidden_includes_construct_runtime::hidden_include::BasicExternalities::execute_with_storage(storage, || {
            <Self as self::sp_api_hidden_includes_construct_runtime::hidden_include::traits::BuildGenesisConfig>::build(&self);
            Ok(())
        })
    }
}

The default node uses this struct with "std" enabled to build the genesis storage. So when std is removed, my code won't work anymore.

michalkucharczyk commented 9 months ago

I thought part of the move to native-runtime free was to remove "std" from the macro expansion of the GenesisBuilder struct of each pallet.

edited: Yeah, we need BuildGenesisConfig inside wasm blob, and there is special API (GenesisBuilder) so the node can call into wasm blob

That is the main pain point which started the discussion, because if std is not passed into that struct by the macro, then I cannot use tokio.

I don't see a problem here. Features are not passed by macro (which is just putting a #cfg guard into generated code), they are defined at compilation level (e.g. in Cargo.toml or via --features cargo flag).

JuaniRios commented 9 months ago

I understand what you are saying, but if you remove the "std" cfg from theGenesisBuilder, it is also saying implicitly that you will now never call it with "std" by default.

I guess one solution would be like you suggested to have another genesis builder logic in my node, where I am calling it with "std" compiled.

If we generate the genesis storage by calling the code directly, does that mean that it will always be a "native runtime"? It's not so clear to me what it means to run the runtime as wasm, since that was always abstracted away from me by the node.

michalkucharczyk commented 9 months ago

I messed with naming so I edited my previous comment.

I guess one solution would be like you suggested to have another genesis builder logic in my node, where I am calling it with "std" compiled.

FWIS you already have this logic implemented. All you have to do is to call build_storage on the instance of your RuntimeGenesisConfig in std-enabled environment. This call will internally execute your build function (which uses async staff) and generate storage which should be converted into entries in raw genesis config.

If we generate the genesis storage by calling the code directly, does that mean that it will always be a "native runtime"? It's not so clear to me what it means to run the runtime as wasm, since that was always abstracted away from me by the node.

If you need this only for testing, then I would not call it native runtime. You could think of it as some kind of util generating test chain-specs.

Think of native runtime as runtime code which is linked with node (thus name native), and is also a crate dependency for node which makes all internal (public) types and functions available in rust.

JuaniRios commented 9 months ago

@michalkucharczyk Thank you for the answers, much more clear now!

Yes I am already calling build_storage on my RuntimeGenesisConfig by using the default node implementation, and running it with the command build-spec.

If I never move to the "omninode", does this mean that my "std" code will still run, even if the native-runtime is "removed"?

michalkucharczyk commented 9 months ago

If I never move to the "omninode", does this mean that my "std" code will still run, even if the native-runtime is "removed"?

Shall be fine.

paritytech / polkadot-sdk

`GenesisConfig` in a native runtime free world #25