Introspector fails after chain metadata update

AndreiEres commented 1 year ago

As we use subxt, we are required to use a preloaded metadata. If the network's metadata changes, the introspector crashes.

Currently, we check in the CI for each PR if the metadata is up-to-date. This helps us identify the problem within a day or two, but we want a better solution.

The possible option is to create automated PRs to update the metadata. However, ideally, we would prefer to have no static metadata at all. Is this possible?

Let's share and discuss ideas @sandreim, @vstakhov, @lexnv, @prybalko

lexnv commented 1 year ago

Updating the metadata with every runtime upgrade can be indeed a tedious task.

If the network's metadata changes, the introspector crashes.

I wonder, does this happen with every runtime update?

I believe there are a few options to make this transition a bit less painful:

Instead of relying on downloaded subxt metadata, the subxt macro is able to fetch the metadata from a given URL

This still offers the same guarantees that are currently exposed by subxt, mainly compile-time safety to ensure that taking with the node is still possible. The change would require modifying all subxt macro invocations to point to the RPC endpoint, as opposed to loading the metadata from the disk.

https://github.com/paritytech/polkadot-introspector/blob/71193993c3d5b3e985f3380e8ee582bfcc8fc052/essentials/src/metadata.rs#L24-L33

For example, replacing the runtime_metadata_path = "assets/rococo_metadata.scale" with runtime_metadata_url = "wss://rpc.polkadot.io:443".

When a runtime upgrade happens, the polkadot-interceptor binary would still need a binary update.

Using the dynamic API of subxt instead of relying on metadata entirely.

We expose the ability for users to create dynamic queries to the chain (see dynamic storage query for more details).

This removes the need of having to manage the metadata as well. A transition example would look similar to:

   // Static version:
   #[subxt::subxt(runtime_metadata_path = "../artifacts/polkadot_metadata_small.scale")]
   pub mod polkadot {}
   let storage_query = polkadot::storage().system().account(&account);

   // The dynamical version: 
   let storage_query =
        subxt::dynamic::storage("System", "Account", vec![Value::from_bytes(account)]);

However, this has the downside of losing the compile-time checks that you would get from using the subxt macro. A certain call that you are using will now only fail at runtime, after communicating with the chain.

Subxt still needs the runtime metadata of the chain to make this transition possible. The metadata is fetched when creating a client (ie let api = OnlineClient::<PolkadotConfig>::new().await?;) and similar to the disk metadata, the chain metadata might change in the meanwhile. To mitigate this, we provide the ability to subscribe the subxt client to the runtime upgrades (see perform_runtime_updates). To make this happen, you'll need to spawn a separate task and let the update function handle the rest. This will still have a small gap, where the polkadot-interceptor might want to submit a call to the chain and the background updating task did not complete the update yet.

// cc @jsdw @niklasad1

jsdw commented 1 year ago

I'd like to know more about what exactly is failing.

Is it crashing because of this?:

https://github.com/paritytech/polkadot-introspector/blob/7286f479f0c7e45a7f2d5332c33f2a179f79b98e/metadata-checker/src/main.rs#L49

That polkadot::validate_codegen method checks that everything is pretty much the same between generated code and current node metadata; ie that every call, storage, constant etc is identical in shape between runtime updates.

If you only need a select set of calls etc to be the same, it's quite overkill to use; instead you'd have these options:

Each individual static call does already validate itself against the current node metadata and refuse to execute if the node has deviated from the codegen for that precise thing, so you have that safety net as a check against things changing.
If you only care about interacting with a specific pallet or set of pallets, you can strip the metadata to only include a certain set of pallets using the subxt CLI tool. Then, polkadot::validate_codegen will only check the pallets you're actually using against the live node, and will only return an error if those specific pallets change.
If you want to opt out of any validation of individual calls etc, you can call .unvalidated() on it. Doing this makes it exactly as likely to succeed or fail as using the equivalent dynamic type.

If you were to move to using the dynamic interface:

You'd lose all of the static type safety and it's be generally more of a pain to work with in this way.
Anything that would have failed using the static interface would still fail via the dynamic one, but you'd have no validation per call and no way to do validation of the subset of metadata you're using or whatever.

All of that to say, I'd strongly recommend that you stick with the static interface :)

Instead of relying on downloaded subxt metadata, the subxt macro is able to fetch the metadata from a given URL

Personally I wouldn't do this, because at any point the tool could simply stop compiling owing to some change in the interface, and it requires a network connection to the node at compile time. it's really best used in CI sorts of setups only, and not in prod code

To mitigate this, we provide the ability to subscribe the subxt client to the runtime upgrades (see perform_runtime_updates).

I'd definitely do this if your program is long-running; it'll make sure that Subxt always has the latest metadata for the node it's talking to, so it can warn more quickly if something in the interface changes from what you're expecting, and without it you won't be able to submit tx's anyway :)

AndreiEres commented 1 year ago

That polkadot::validate_codegen method checks that everything is pretty much the same between generated code and current node metadata; ie that every call, storage, constant etc is identical in shape between runtime updates.

No, we use it only in CI to check if we still have a fresh metadata. We're talking more about parachain-tracer package.

jsdw commented 1 year ago

No, we use it only in CI to check if we still have a fresh metadata. We're talking more about parachain-tracer package.

Ok gotcha! In that case, Perhaps the above responses were useful, but if not I'd need more information on what the error is I think :)

sandreim commented 1 year ago

I am wondering if using the dynamic API of subxt would work for the tracer usecase. I think we should experiment with that and see if there are any limitations compared to the static approach we have right now.

AndreiEres commented 1 year ago

I use a combined approach.

We're going to get the host configuration over the dynamic API. See https://github.com/paritytech/polkadot-introspector/pull/414. It was a thin spot that caused the tracer to crash immediately. I tested tracer with mismatched kusama metadata and it still worked.
I setup automated metadata updates, like https://github.com/paritytech/polkadot-introspector/pull/420. It will notify us about changes.

I think we're good so far and I can close the issue. Thank you guys for your ideas and help.

paritytech / polkadot-introspector

Introspector fails after chain metadata update #381