Open MaksymZavershynskyi opened 4 years ago
Is this a long-term requirement, or is needed by now to provide additional stability?
It is medium term goal.
We should try to enable it in the next 1-2 months, to ensure extra layer of safety.
OK, will take a look on feasibility of this one.
@olonho are we still working on this issue?
This issue has been automatically marked as stale because it has not had recent activity in the last 2 months. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.
@matklad do we still plan to do this?
Yeah, we do want to do that eventually, but its no-longer on the near-term roadmap.
This issue has been automatically marked as stale because it has not had recent activity in the last 2 months. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.
Three thoughts here:
I also think that we get more benefit from differential fuzzing of wasmer vs wasmtime than we would from panicking if wasmer and wasmtime were to disagree: I wouldn't be surprised if there were an instance where wasmer and wasmtime disagree (even though fuzzing didn't find anything yet), and panicking all the nodes in the network at once would be very bad if an attacker discovered one such case.
That said, having wasmtime canary node might be a good idea, if only to get real-world examples of discrepancies, if there is any discrepancy to be found, so we know about them and can fix them.
To ensure there are no Wasmer- or Wasmtime-specific bugs in our codebase we need to run both Wasmer and Wasmtime side-by-side and panic when they disagree.
Specifically, each block is going to be produced using Wasmer and Wasmtime executed in parallel. Each of them would produce a different
TrieUpdate
and other artifacts, like execution outcome and receipts. Before we callTrieUpdate::finalize
we would compare them and if they are different we would panic. This will guarantee that if there is a transient error caused by Wasmer/Wasmtime the block producer is not going to be slashed. This would also guarantee that if there is a deterministic error that happens only on some nodes the network does not diverge.This however, would require increasing hardware requirements 2x for our validators. CC @jimmy3dita , @bowenwang1996 , @chefsale