Versioned PVF execution APIs

rphmeier commented 1 year ago

We will need to support changes to the interfaces of PVFs over time, with both new inputs and outputs.

Here is a description of the status quo and a proposal for versioning the PVF Wasm API.

Status Quo

The ValidationParams are what are passed into the validate_block function of any particular PVF, and they currently look like this https://github.com/paritytech/polkadot/blob/1203b2519fed1727256556fb879c6c03c27a830d/parachain/src/primitives.rs#L354-L363

The good news is that these are generated in the candidate-validation subsystem, which has access to the CandidateReceipt, and therefore the relay_parent. This means that PersistedValidationData doesn't need to be updated, only the executor interface.

https://github.com/paritytech/polkadot/blob/1203b2519fed1727256556fb879c6c03c27a830d/node/core/candidate-validation/src/lib.rs#L588-L593

Proposed Changeset

Summary of changes:

Expect PVFs to provide a range of supported API versions in some data section of the Wasm module. If this doesn't exist, assume the supported version is [1]. (Q: which section is appropriate?)
Adjust the ArtifactState::Prepared to store the supported versions
Add an ExecutorParam indicating the range of allowed API versions
All API versions other than 1 use a new entry-point versioned_validate_block, which expects to read a (version: u32, [params bytes]) tuple
The PVF must always be executed with the highest API version which is both in its supported list and the allowed list.

We'd also need to update Cumulus to compile Wasm blobs which include the API version and to implement the versioned_validate_block.

ValidationParams `v2`

struct ValidationParams {
  // all fields from v1
  relay_parent: Hash,
}

s0me0ne-unkn0wn commented 1 year ago

(Q: which section is appropriate?)

If I'm not missing anything, it seems like just two exported constant globals are enough.

Add an ExecutorParam indicating the range of allowed API versions

Why do you think it's needed? The node gets the version range from those globals, and it's either aware of those versions and is able to encode the parameters or not aware, and then it cannot execute that PVF. So an additional restriction in the ExecutorParams seems redundant, or maybe you had something different in mind?

All API versions other than 1 use a new entry-point versioned_validate_block, which expects to read a (version: u32, [params bytes]) tuple

The version could be put in the first field of ValidationParams, the PVF decodes the first field of known length and knows how to decode the rest so that we could get rid of the explicit version parameter.

General thought: aren't we overcomplicating things here? Could we escape with a single version supported by PVF? Exactly as with the range situation, the node is either aware of that version and can encode the parameters, or it's not aware (too old) and cannot execute the PVF. In what situations would we benefit from having a range instead?

bkchr commented 1 year ago

I think the most simplest way would be to have versioning on the validate_block level as we do it for host functions. So, just introduce validate_block_version_2 that expects the new parameters and returns the new/old return value. The PVF executor can then find out about the version the PVF supports by checking which validate_block functions are exported.

General thought: aren't we overcomplicating things here? Could we escape with a single version supported by PVF? Exactly as with the range situation, the node is either aware of that version and can encode the parameters, or it's not aware (too old) and cannot execute the PVF. In what situations would we benefit from having a range instead?

We will need to support multiple versions. First point being that we first release the changes to the nodes and can also in parallel let Cumulus already export the new functions. Then, when we enable the new function there will maybe already be some Parachains supporting it, but there will be quite a lot that will only support the old validate_block. Maybe at some point in the future, like in one year, we can remove the support for the old validate_block.

rphmeier commented 1 year ago

Why do you think it's needed? The node gets the version range from those globals, and it's either aware of those versions and is able to encode the parameters or not aware

If two different entry-points / versions have two different behaviors, then it could lead to consensus splits when some validator nodes have updated to be able to validate with some version N and other nodes can only validate with some version less than N.

The entry-point/version that any particular candidate is executed with needs to be the exact same across all validators. Parachain runtimes may update their code to support the new entrypoint before all validators have upgraded, so the choice of which version to use cannot be safely determined by the validator node version and the exposed entry points.

Because of that, we clearly need a maximum version to be determined by the relay-chain runtime. Then as Basti mentions, we may want to set minimum versions to deprecate old logic, though this is probably years down the line. Still, seems like low-effort future proofing that we might as well knock out at the same time.

I think the most simplest way would be to have versioning on the validate_block level as we do it for host functions. So, just introduce validate_block_version_2 that expects the new parameters and returns the new/old return value

I'm not opinionated on the mechanics here because I'm no Wasm expert, but I am pretty sure we need the PVF to support some contiguous range of versions because of the maximum/minimum requirements being determined by the relay chain runtime. Doing a linear search to check for existence of N entry points is small, because N should be small and we only do it on preparation anyway.

bkchr commented 1 year ago

Doing a linear search to check for existence of N entry points is small, because N should be small and we only do it on preparation anyway.

Should be fairly simple. We can just get all exports with: https://docs.rs/wasmtime/latest/wasmtime/struct.Module.html#method.exports and then filter for functions and check the names.

slumber commented 1 year ago

Doing a linear search to check for existence of N entry points is small, because N should be small and we only do it on preparation anyway.

Should be fairly simple. We can just get all exports with: https://docs.rs/wasmtime/latest/wasmtime/struct.Module.html#method.exports and then filter for functions and check the names.

Going a little bit into details: in order to read an export from the blob we need to instantiate a module. Current code essentially calls https://docs.rs/wasmtime/latest/wasmtime/struct.Engine.html#method.precompile_module , bypassing instantiation, and writes serialized module directly to the filesystem. Meaning, the constant cannot be accessed until the very execution, and candidate validation cannot be aware of it.

So option 1 is to instantiate it right after preparation, and send API version along with artifact path back to host from the prepare worker. option 2 is to always send v2 params to execute worker, and then provide

fn into_v1(self) { /* ... */ }

to be called from there.

And little changes to be done for candidate validation itself.

bkchr commented 1 year ago

This method may be used to compile a module for use with a different target host. The output of this method may be 
used with [Module::deserialize](https://docs.rs/wasmtime/latest/wasmtime/struct.Module.html#method.deserialize) on 
hosts compatible with the [Config](https://docs.rs/wasmtime/latest/wasmtime/struct.Config.html) associated with this 
[Engine](https://docs.rs/wasmtime/latest/wasmtime/struct.Engine.html).

This doesn't require instantiation as far as I can see?

slumber commented 1 year ago

This method may be used to compile a module for use with a different target host. The output of this method may be 
used with [Module::deserialize](https://docs.rs/wasmtime/latest/wasmtime/struct.Module.html#method.deserialize) on 
hosts compatible with the [Config](https://docs.rs/wasmtime/latest/wasmtime/struct.Config.html) associated with this 
[Engine](https://docs.rs/wasmtime/latest/wasmtime/struct.Engine.html).

This doesn't require instantiation as far as I can see?

Yes, and therefore we can't access constants there

bkchr commented 1 year ago

I don't get it. You said we use precompile_module to directly precompile the modue and write it to disk. I say that you can use Module::deseralize on the output of precompile_module and then call module.exports(). What am I missing?

slumber commented 1 year ago

I don't get it. You said we use precompile_module to directly precompile the modue and write it to disk. I say that you can use Module::deseralize on the output of precompile_module and then call module.exports(). What am I missing?

My bad, I missed that proposed solution only looks for a symbol being present in the module, without reading into it. Yes, we don't need instantiation.

I'll hide the conversation.

paritytech / polkadot-sdk