paritytech / polkadot-sdk

The Parity Polkadot Blockchain SDK
https://polkadot.network/
1.77k stars 634 forks source link

Executor parameters change may affect PVF validity and executability retrospectively #694

Open s0me0ne-unkn0wn opened 1 year ago

s0me0ne-unkn0wn commented 1 year ago

Original discussion cited here: https://github.com/paritytech/polkadot/pull/6873#pullrequestreview-1337573839

@s0me0ne-unkn0wn:

What if a PVF is pre-checked in one session and gets enacted in another session? Executor parameters may change in between. There is not much we can do about that, but it's worth considering.

@mrcnski:

I thought about that also. If executor params can affect preparation, then it would invalidate the assumptions documented here. Probably we would have to do pre-checking again when changing any parameter that can affect the runtime.

@s0me0ne-unkn0wn:

We can do that, but what's next? We cannon un-enact a PVF. Consider the situation when it's the first and only PVF for a parachain just onboarded.

@burdges:

I suppose Wasm stack limit is a stack used by Wasm code, while native stack limit is used by the Wasm engine, our host calls, etc., yes? Aside form the two existing and planned preperation limits then these all sound more determined by the block than by the runtime, so not a new problem and not sure why they'd require rebuilding.

As for the two existing and planned preperation limits, are we instrumented to record the current max during preperation? I suppose no as code is rarely instrumented like this, but if so then we could just deactivate the PVF if the recorded limit gets invalidated on chain.

Anyways our problem is a preperation limit change could make preparing an already valid PVF impossible, which gives an adversary some control over who can check the PVF. Importantly, we should not enact a limit change immediately, just like we do not enact a PVF change immediately. Assuming all validators do rebuild, then yeah we'd prepare to unenact the PVF but governance should've time and ability to revert the limit change. Isn't this enough?

We could even have the vote block the limit change, so then governance should manually boot any fat parachains before reducing limits. Is that simpler? It's also kinda shitty that all validators must rebuild just to check a preperation limit violation.

mrcnski commented 1 year ago

@s0me0ne-unkn0wn Perhaps if an executor param change comes in that could affect preparation, we put the PVF through pre-checking again, and if it fails then we reject the change in params. Not sure how possible this is and sounds complex, but may be worth doing in order to uphold our guarantees around preparation, and prevent disputes there.

burdges commented 1 year ago

You could've the validators send some revert limit change votes themselves, maybe the code could be shared with the governance code for doing the limit change somehow? I don't think it matters if you revert the PVF or the limit change, as either way governance should choose between the limit change and the existing PVF. In particular, if it's easier to uneact the PVF and make the parachain upload a new one, or governance revert their limit change, then that's fine too.

bkchr commented 1 year ago

Executor parameter changes will always need to be done after some testing and thinking. I would also expect that we only see parameters to "grow" and not to "shrink". Like the memory for the execution, why should it shrink? If there are any new kind of parameters like a better stack depth algorithm or whatever, we would clearly need to test this with all known PVFs, maybe with all ever existing PVFs to ensure we don't break stuff.

If we really break anything, we need to do what @burdges said and revert our changes.

burdges commented 1 year ago

A priori, anytime we shrink then we likely know at least one parachain who already violated the new limits, so I'd hope we already discussed the consequences.

mrcnski commented 1 year ago

That is a good point @bkchr. Also, pre-checking is in part intended to guard against adversarial PVFs, whereas adversarial changes in executor params seem unlikely to be enacted.

mrcnski commented 1 year ago

Re-posting a comment from @eskimor (from https://github.com/paritytech/polkadot/pull/6873#issuecomment-1469052413):

We could (not saying we should) don't accept execution environment parameter updates if pre-checking fails for a PVF that was valid before. So changing parameters would also be pre-checked, by re pre-checking all PVFs before enacting them. I actually think this is absolute overkill and does very likely more harm than good, but in that particular case we "could" do something. Realistically, I think we have to accept a small risk and live with it.

s0me0ne-unkn0wn commented 1 year ago

I suppose Wasm stack limit is a stack used by Wasm code, while native stack limit is used by the Wasm engine, our host calls, etc., yes?

Not exactly. The native stack limit is simple, it's the limit of the native stack during the execution of a compiled PVF. It's enforced during the execution, and PVFs do not use host functions afaik, so everything is quite straightforward here.

Wasm stack limit is another beast: it's the limit of the number of values on Wasm value stack. But after the compilation, there's no such thing as a "Wasm value stack" (it's purely abstract) so we enforce this limit through the instrumentation of the original PVF code, and that instrumentation takes place during the preparation phase. So this limit affects preparation and also affects artifacts. We even may have several artifacts for a single PVF if we need to execute it with different executor params from different sessions.

Sidenote: the native stack limit is derived from the wasm stack limit. It was made like that with the assumption that they are proportional (which sounded logical at that moment). Later it was proven they are not. Also, the native stack limit is not deterministic in the general case. So the approach to those limits should be reworked at some point in time anyway.

burdges commented 1 year ago

Alright. We surely have polkadot host functions which PVFs invoke, including signatures verification. We likely also have host functions used only by polkadot, to which PVFs lack access.

s0me0ne-unkn0wn commented 1 year ago

Just an idea: now, after paritytech/polkadot#6934 is merged, new executor parameters are postponed for two sessions and kept in the PendingConfigs storage of the configuration pallet for that period. Does it make any sense to always pre-check using the pending configuration if different from the current one? Does it make it all click or not? Not sure :thinking:

mrcnski commented 1 year ago

Good idea. If we pre-check with the pending configuration, we risk having finality stall if there is an issue with that PVF on the current exec params.

I think we should pre-check with both configs. And if possible, we should also run preparation again for all PVFs when changing exec params. See https://github.com/paritytech/polkadot-sdk/issues/657.

mrcnski commented 1 year ago

Actually I checked the docs and PVFs are onboarded two sessions after pre-checking. So we should pre-check with the pending config!

The effects of the PVF accepting depend on the operations requested it:

  1. All onboardings subscribed to the approved PVF pre-checking process will get scheduled and after passing 2 session boundaries they will be onboarded.
  2. All upgrades subscribed to the approved PVF pre-checking process will get scheduled very similarly to the existing process. Upgrades with pre-checking are really the same process that is just delayed by the time required for pre-checking voting. In case of instant approval the mechanism is exactly the same.