near / nearcore

Reference client for NEAR Protocol
https://near.org
GNU General Public License v3.0
2.32k stars 624 forks source link

`FunctionCallError`: expose information about function call execution only (that it is actionable to the end user invoking the contract) #8502

Open nagisa opened 1 year ago

nagisa commented 1 year ago

Today preparation code implements multiple different steps, all of which return a sometimes unique variant of the PrepareError enum. For example if deserialization fails, a PrepareError::Deserialize is returned, while having invalid memory or imports from a wrong module will output their own PrepareError variants.

If we wanted to switch up the ordering of these checks, or something along these lines, the exact error type may change for certain invalid contracts. It is not entirely clear if such a change would affect the protocol stability or not, though, and that’s a problem!

If it doesn’t in fact affect the protocol stability, then there's no good reason to make this so obfuscated – we should change the code to discard the exact error type earlier. That way it is (more) obvious to the developers whether the ordering is relevant or not.

If it does affect the protocol stability, it might make sense to push for undoing this relationship. Right now the information available from preparation is insufficient for high quality diagnostics, and would only manage to hint at the problem is a very rudimentary way. And we can’t really commit to higher quality errors precisely because that would mean even more bonding between precise structure of the preparation code and the protocol versions…

(This work can be obviated at least in part by limited replayability work, but it may be a good idea to push for this nevertheless)

nagisa commented 1 year ago

cc @near/contract-runtime

akhi3030 commented 1 year ago

Thanks for creating the issue @nagisa! One thing that I don't remember discussing last week is what the is the priority of addressing this problem. Is it blocking some urgent work or something that is nice to have or a good candidate for engineering excellence and maybe should be linked to https://github.com/near/nearcore/issues/8203.

nagisa commented 1 year ago

This probably wants to get closed as part of integrating finite-wasm since it will result in a protocol version break anyhow. I can imagine the solution being either as:

  1. We verify that these error variants do not make it out into the protocol and do not affect its stability – we make sure to erase the error return in near_vm_runner::prepare.
  2. We cannot verify that – add a comment for the old iterations of prepare code and still erase it for the new version of prepare that is based on finite-wasm.
akhi3030 commented 1 year ago

sounds good!

Ekleog-NEAR commented 1 year ago

FWIW, it’s basically already in my current working directory for integrating the new wasmer into nearcore (that I’m renaming near-vm as per @jakmeier ’s suggested naming, it is basically https://github.com/near/nearcore/pull/8323 with way more changes in my working directory than I’d like but tests seem still far from passing so...).

I’m working there to actually make it not call prepare at all; given that all instrumentation is now done in near-vm itself and the only missing thing would be memory standardization and a few checks. It’s already calling another function that only does that, and I hope to be able to remove that other function altogether, thus removing the PrepareError completely.

Ekleog-NEAR commented 1 year ago

Actually, one more thought. Even without PrepareError, we still would be having CompileError that can happen. Do we want to also shrink that all into a single ZST "an error happened during contract compilation"?

My personal thoughts are yes, for the same reason as there’s no reason to even make a difference between prepare error and compile error. (Though I’m not sure how easy it’ll be to implement in practice, given we’ll still need to somehow fit both old and new protocol versions in the same type at the end of the day… haven’t looked into that yet)

But then we’ll need to actually make sure we have some tooling that can reproduce the contract compilation with nice, complete error messages, for dev UX. Though I guess we can commit to making this tooling soon after completing these changes and not necessarily before.

jakmeier commented 1 year ago

Yes, conflating it all to a single error makes sense to me.

Though I’m not sure how easy it’ll be to implement in practice, given we’ll still need to somehow fit both old and new protocol versions in the same type at the end of the day… haven’t looked into that yet

We have several places where we use something like:

pub enum ExecutionMetadata {
    /// V1: Empty Metadata
    V1,
    /// V2: With ProfileData by legacy `Cost` enum
    V2(ProfileDataV2),
    // V3: With ProfileData by gas parameters
    V3(ProfileDataV3),
}

I imagine this to be at most as complicated as the example here. Potentially easier if we can find something clever, but I don't see anything right now.

But then we’ll need to actually make sure we have some tooling that can reproduce the contract compilation with nice, complete error messages, for dev UX. Though I guess we can commit to making this tooling soon after completing these changes and not necessarily before.

I guess yes, we can commit to that later. It would be great to have a rough design ready before we make the change but even that is optional.

nagisa commented 1 year ago

I’m working there to actually make it not call prepare at all; given that all instrumentation is now done in near-vm itself and the only missing thing would be memory standardization and a few checks.

I don't believe we can remove prepare entirely, unless we manage to reduce the number of VMs we support in the code base down to 1. Memory standardization and the checks can still be implemented just as well in the to-be near-vm, but for wasmtime and such it still remains a pain point.

But then we’ll need to actually make sure we have some tooling that can reproduce the contract compilation with nice, complete error messages, for dev UX.

Right, I think this imposes some constraints regarding where we could erase the error information exactly.

Do we want to also shrink that all into a single ZST "an error happened during contract compilation"?

Yeah, I think the FunctionCallError should be the only thing that retains any detail, and any detail retained should be directly relevant to the function call action and actionable to the user making the call. Issues with the contract like compilation or linking failing are not either of these and should be scrubbed. So only things like “method name does not exist” and “trap/host error” would remain. (I’m not even sure if there's value in making the distinction between trap and host error?)