Run Wasmer and Wasmtime together

MaksymZavershynskyi commented 4 years ago

To ensure there are no Wasmer- or Wasmtime-specific bugs in our codebase we need to run both Wasmer and Wasmtime side-by-side and panic when they disagree.

Specifically, each block is going to be produced using Wasmer and Wasmtime executed in parallel. Each of them would produce a different TrieUpdate and other artifacts, like execution outcome and receipts. Before we call TrieUpdate::finalize we would compare them and if they are different we would panic. This will guarantee that if there is a transient error caused by Wasmer/Wasmtime the block producer is not going to be slashed. This would also guarantee that if there is a deterministic error that happens only on some nodes the network does not diverge.

This however, would require increasing hardware requirements 2x for our validators. CC @jimmy3dita , @bowenwang1996 , @chefsale

stefanopepe commented 4 years ago

Is this a long-term requirement, or is needed by now to provide additional stability?

olonho commented 4 years ago

It is medium term goal.

MaksymZavershynskyi commented 4 years ago

We should try to enable it in the next 1-2 months, to ensure extra layer of safety.

olonho commented 4 years ago

OK, will take a look on feasibility of this one.

bowenwang1996 commented 3 years ago

@olonho are we still working on this issue?

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity in the last 2 months. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

bowenwang1996 commented 2 years ago

@matklad do we still plan to do this?

matklad commented 2 years ago

Yeah, we do want to do that eventually, but its no-longer on the near-term roadmap.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity in the last 2 months. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

matklad commented 2 years ago

Three thoughts here:

It doesn't seem likely that we'll be able to run two vms in prod any time soon, for various technical reasons
We got most of the benefit here via differential fuzzing of wasmtime vs wasmer I think. @Ekleog-NEAR, what's our feeling on that? Do we get enough coverage here?
What we can potentially do is setup wasmtime canary nodes, or history replay.

Ekleog commented 2 years ago

I also think that we get more benefit from differential fuzzing of wasmer vs wasmtime than we would from panicking if wasmer and wasmtime were to disagree: I wouldn't be surprised if there were an instance where wasmer and wasmtime disagree (even though fuzzing didn't find anything yet), and panicking all the nodes in the network at once would be very bad if an attacker discovered one such case.

That said, having wasmtime canary node might be a good idea, if only to get real-world examples of discrepancies, if there is any discrepancy to be found, so we know about them and can fix them.

near / nearcore

Run Wasmer and Wasmtime together #3187