paritytech / polkadot-sdk

The Parity Polkadot Blockchain SDK
https://polkadot.network/
1.62k stars 564 forks source link

Allow runtime-wide `TryState` checks #235

Open ggwpez opened 1 year ago

ggwpez commented 1 year ago

Currently the TryState can only be utilized on a per-pallet level by implementing the try_state hook.
This limits the usage tor multi-pallet try_state functionality. Executive could be extended to accept an optional TryState which will then be executed by the try-runtime CLI on a runtime-wide level.

cc @kianenigma

kianenigma commented 1 year ago

Yup, should be an easy one relatively (cc-ing @gpestana @Ank4n).

bkchr commented 1 year ago

Executive could be extended to accept an optional TryState which will then be executed by the try-runtime CLI on a runtime-wide level.

We don't really need to forward this into Executive. We have all pallets as tuple in the runtime and just calling AllPalletsWithSystem::try_state should be enough. Before that we would need to change: https://github.com/paritytech/substrate/blob/05d11c213b8e097a5368a887b6d055db61ce05db/frame/support/procedural/src/pallet/expand/hooks.rs#L210-L226

to:

        #frame_support::try_runtime_enabled! {
            impl<#type_impl_gen>
                #frame_support::traits::TryState<<T as #frame_system::Config>::BlockNumber>
                for #pallet_ident<#type_use_gen> #where_clause
            {
                fn try_state(
                    n: <T as #frame_system::Config>::BlockNumber,
                    _s: #frame_support::traits::TryStateSelect
                ) -> Result<(), &'static str> {
                    #log_try_state
                    <
                        Self as #frame_support::traits::Hooks<
                            <T as #frame_system::Config>::BlockNumber
                        >
                    >::try_state(n)
                }
            }
        }
kianenigma commented 1 year ago

@bkchr I am not sure why you are against this. The main goal is to enable a try-state check that will look into multiple pallets. The only way to achieve this would be through doing it at the runtime (probably via executive)

bkchr commented 1 year ago

I did not say that I'm against this. I just said that we don't need to move this to frame-executive, because we just can call AllPalletsWithSystem::try_state in the runtime. If you now call Executive::try_state or what I have written before, there would be no difference. (just less code in frame-executive)

liamaharon commented 10 months ago

So we can already write and execute TryState checks in Executive? Then we can close this issue?

liamaharon commented 10 months ago

So we can already write and execute TryState checks in Executive? Then we can close this issue?

On further thought I don't this is sufficient. Most runtimes use the Executive pallet from FRAME rather than defining their own. I think we'd want some way to pass try-state hooks into the executive pallet, like we do with migrations, so that chain-specific try-state checks can be written

ggwpez commented 4 months ago

I just said that we don't need to move this to frame-executive, because we just can call AllPalletsWithSystem::try_state in the runtime

You mean we copy&paste the same try-state check into each runtime where we want to use it?
I think we could for example have a multi-pallet check for Staking that takes a few generics and then checks that all the staking pallets have consistent overall state.
This can then be re-used, but yes it not necessarily be in Executive.

kianenigma commented 3 months ago

Sensing some confusion while reading this.

So, looking at the issue again afresh, we need to do exactly what we do with 'COnRuntimeUpgrade': Pass a new 'CTryState = ()' to executive, and run all checks on '(AllPallets, CTryState)' instead. Sounds right?

ntn-x2 commented 2 months ago

Any update on this issue? I just stumbled across a similar need, and the only way around it that I found is feature-gating an additional type parameter on each pallet's Config, to allow the runtime to be injected in there.

ntn-x2 commented 1 month ago

@kianenigma @ggwpez I've been going over this issue and this code again, and I'm willing to put together a PR if we can agree on a design for the solution.

As far as I can see, the same checks are repeated both in try_runtime_upgrade and in try_execute_block for the frame Executive. Currently, the frame_try_runtime::TryRuntime trait implementation for a runtime, or at least our runtime, simply calls Executive::try_runtime_upgrade and Executive::try_execute_block, which in turns, as I said, delegates those calls to various sub-components, including the specified AllPalletsWithSystem, which right now it's an implementation details and cannot be changed.

The easiest option would be to include a new generic that implements the TryState trait and simply call (AllPalletsWithSystem, TryStateType)::try_state... everywhere where needed. I don't really understand the solution proposed by @bkchr, on which maybe you can shed a bit more light? 😊 From what I've found, as long as we're not able to "override" the implementation details of the executive, we can't implement custom runtime-wide try-state checks in a reliable way, but I'm not very deep into the technicalities. On a side note, I also find it "weird" that the OnRuntimeUpgrade generic for the executive has to only implement the OnRuntimeUpgrade logic and not also the TryState logic, which would fix this issue.

ntn-x2 commented 1 month ago

So we could replace

if checks.try_state() {
    let _guard = frame_support::StorageNoopGuard::default();
    <AllPalletsWithSystem as frame_support::traits::TryState<BlockNumberFor<System>>>::try_state(
        frame_system::Pallet::<System>::block_number(),
    frame_try_runtime::TryStateSelect::All,
    )?;
}

with

if checks.try_state() {
    let _guard = frame_support::StorageNoopGuard::default();
    <(COnRuntimeUpgrade, AllPalletsWithSystem) as frame_support::traits::TryState<BlockNumberFor<System>>>::try_state(
        frame_system::Pallet::<System>::block_number(),
    frame_try_runtime::TryStateSelect::All,
    )?;
}

everywhere where relevant (this is taken from substrate v1.0, but it seems it still applies here and here as of today).

liamaharon commented 1 month ago

So we could replace

if checks.try_state() {
    let _guard = frame_support::StorageNoopGuard::default();
    <AllPalletsWithSystem as frame_support::traits::TryState<BlockNumberFor<System>>>::try_state(
        frame_system::Pallet::<System>::block_number(),
  frame_try_runtime::TryStateSelect::All,
    )?;
}

with

if checks.try_state() {
    let _guard = frame_support::StorageNoopGuard::default();
    <(COnRuntimeUpgrade, AllPalletsWithSystem) as frame_support::traits::TryState<BlockNumberFor<System>>>::try_state(
        frame_system::Pallet::<System>::block_number(),
  frame_try_runtime::TryStateSelect::All,
    )?;
}

everywhere where relevant (this is taken from substrate v1.0, but it seems it still applies here and here as of today).

This sounds reasonable to me, similar API as runtime upgrades.

ntn-x2 commented 1 month ago

This sounds reasonable to me, similar API as runtime upgrades.

It would nevertheless be a breaking change since COnRuntimeUpgrade is now required to implement the TryState trait as well. Having a second generic with a default value of () would not be a breaking change, on the other hand, but I feel this approach would be less scalable in the long-term.

liamaharon commented 1 month ago

Yeah I wouldn't add it to the existing trait. Maybe a new one like COnTryState?

kianenigma commented 1 month ago

Sensing some confusion while reading this.

So, looking at the issue again afresh, we need to do exactly what we do with 'COnRuntimeUpgrade': Pass a new 'CTryState = ()' to executive, and run all checks on '(AllPallets, CTryState)' instead. Sounds right?

have you tried this?

ntn-x2 commented 1 month ago

@kianenigma that solution would work, for sure. I am wondering whether it's the best we can do here. If there's yet a new testing function, that would most likely mean a new generic to account for those? But if you guys think that's fine, I can go ahead and work up a little PR that adds the new feature.

kianenigma commented 1 month ago

If there's yet a new testing function, that would most likely mean a new generic to account for those

The nice thing about generics is that they can have defaults, so this is not a breaking change, as long as you add it to the end of the list.

This is not pretty or elegant, but it is backwards compat, and near zero effort, I would do this.

Longer term, I have shared some ideas around improving the development surface of executive with @gupnik or in other issues, but I am not sure if they are formalized or worth doing at this point.

liamaharon commented 1 month ago

I agree with @kianenigma this approach is worth trying. It shouldn't be breaking, COnRuntimeUpgrade type is actually optional on Executive so CTryState should be able to be as well.

ntn-x2 commented 1 month ago

Yes, I agree this approach would not be breaking and relatively straightforward. I was just concerned it might not be the best, but I'm happy to hear there's discussion going on about improving DX on that side.

I'll try to work a PR shortly, and would be great to get this backported to at least as back as 1.6.

ntn-x2 commented 1 month ago

The matter is less trivial than I was expecting. The TryState trait is implemented for tuples, but only where all the elements implement the PalletInfoAccess trait as well, which is clearly not required for a runtime-wide test: https://github.com/paritytech/polkadot-sdk/blob/c7cb1f25d1f8bb1a922d466e39ee935f5f027266/substrate/frame/support/src/traits/try_runtime/mod.rs#L159.

Also, I was hoping for a solution where whatever type wants to implement TryState, would not have to deal with the targets: Select argument of the trait, but rather implement something closer to the try_state function exposed as part of the Hooks trait.

I am wondering if declaring another trait that does not have the second argument, have a blanket impl of that trait for everything that implements the TryState trait except for AllPallets* types (there's no native way to exclude types from a blanket impl), and manually implement the new trait for AllPallets*. Otherwise, I see no other way to solve this issue.

Cc @kianenigma since I see he was the one working on this feature last.

Or alternatively require the new type to also implement PalletInfoAccess, making it clear that it would not be on the same level as an actual pallet since it would only be used in the TryState filtering logic, but this feels dirty to me as well.

liamaharon commented 1 month ago

Do we know what does it use PalletInfoAccess for?

ntn-x2 commented 1 month ago

@liamaharon yes, you can filter which pallets you want to run the tests for: https://github.com/paritytech/polkadot-sdk/blob/c7cb1f25d1f8bb1a922d466e39ee935f5f027266/substrate/frame/support/src/traits/try_runtime/mod.rs#L213.

liamaharon commented 1 month ago

How about we add a new method id and remove the PalletInfoAccess requirement?

Something like

pub trait TryState<BlockNumber> {
    /// Execute the state checks.
    fn try_state(_: BlockNumber, _: Select) -> Result<(), TryRuntimeError>;
        fn id() -> String
}

When pallet implements it, it can use its name. This might be a bit tricky, let me know if u get stuck.

ntn-x2 commented 1 month ago

This would be a breaking change, not sure if that's fine? I think we were trying to solve the issue in a way that does not introduce breaking changes.

ntn-x2 commented 1 month ago

@liamaharon I am not sure that would be an ideal design. The trait already requires the implementor to perform some try_state depending on the checks argument that it is passed. Having an id() -> String method on the trait would add unnecessary duplication; also, this is a requirement that Executive seems to have, but it would not necessarily apply to other trait implementors. Rather, I would go with adding a new trait that contains the id() -> String function and require that in pallet executive rather. Then we can implement the trait for PalletInfoAccess implementing types as well. I am not sure there is a method within Substrate codebase that allows to return a specific type and that also takes &self as a parameter, something like a GetWithArg<T> which has a fn get(&self) -> T method.

What's your opinion on that?

liamaharon commented 1 month ago

Hm yes you are right. I missed that id would need to be manually specified, I initially thought it could be done in the pallet macros.

Your approach sounds good to try, let's see how it looks

ntn-x2 commented 1 month ago

Your approach sounds good to try, let's see how it looks

@liamaharon what approach are you referring to? Adding a new trait that has the id() -> String function? Or adding a GetWithArg<Arg, Output> trait?

ntn-x2 commented 1 month ago

Ok I have to pause work on this again right now, but I found that returning a String ID does not work either, as it would not be implementable by the AllPalletsWithSystem and AllPalletsWithoutSystem types (what's the return value when iterating over a tuple of elements each with its own ID?). I found an alternative way, which is for the tuple to simply implement an Eq<&[u8]> logic, and the specific try_state is executed if any of the provided names matches what was provided as argument.

kianenigma commented 1 month ago

How about we add a new method id and remove the PalletInfoAccess requirement?

Something like

pub trait TryState<BlockNumber> {
  /// Execute the state checks.
  fn try_state(_: BlockNumber, _: Select) -> Result<(), TryRuntimeError>;
        fn id() -> String
}

When pallet implements it, it can use its name. This might be a bit tricky, let me know if u get stuck.

Maybe it is not a breaking change if we provide an auto-implementation of fn id() when this is being generated from within a pallet? also, if this is used just for logging, providing something like this is reasonable to me:

pub trait TryState<BlockNumber> {
  /// Execute the state checks.
  fn try_state(_: BlockNumber, _: Select) -> Result<(), TryRuntimeError>;
        fn id() -> String { "PLEASE-PROVIDE-ID" }
}
ntn-x2 commented 1 month ago

@kianenigma @liamaharon I opened a PR that fixes this: https://github.com/paritytech/polkadot-sdk/pull/4631. I did not find a way to make it work by adding an id() -> String function to the TryState trait, which must be implemented for tuples (since AllPalletsWithSystem must implement it), but I found a nice solution that is backwards compatible, and allows developer to hook their try-state logic without having to implement any logic regarding the Select parameter of the try_state function. Eager to get your feedback on it.