paritytech / substrate

Substrate: The platform for blockchain innovators
Apache License 2.0
8.39k stars 2.65k forks source link

Leaked non-determinism in batch signature verification #6653

Open NikVolf opened 4 years ago

NikVolf commented 4 years ago

If runtime is tricked to do the following

  1. start batch verification.
  2. batch-verify invalid signature.
  3. batch-verify any signature and put the result of verification in some state value (A).
  4. call finish batch verification and drop the result.

Then the value of (A) is undefined due to non-deterministic state of multi-threaded batch verifier.

See some discussion here: https://github.com/paritytech/substrate/pull/6643#discussion_r453662819

burdges commented 4 years ago

I'm confused: If you start a verification batch then you should receive zero information while processing the batch, and only recieve valid or invalid when you finish the batch. There is only a negligible chance of valid or invalid being returned incorrectly. We cannot process the batch in memory accessible to WASM obviously.

If there are complications with an incremental interface, then you might choose some interface in which you pass the entire batch from the runtime to the host all at once. Ideally we might do this early durng processing the block, run this batch verification inside another thread, and let the runtime sort out the meaning of all those signatures in parallel, but that sounds like a much larger change.

bkchr commented 4 years ago

The execution you described is a clear misusage of the api. We can not prevent this entirely. To improve the situation we could panic if there is no batch verification context registered.

If the batch verification fails, all the changes it between are invalid. As we use batch verification for verifying transaction signatures in a block, the block is discarded anyway. We need to document this very clearly. And in the current implementation we support only one batch verification context anyway, so any other usage will lead to the other issue we fixed last week.

bkchr commented 4 years ago

@burdges we return whether the signature could be added to the verification context or if already any signature failed as a fast return of a block import.

NikVolf commented 4 years ago

If you start a verification batch then you should receive zero information while processing the batch

ideally, yes

but we have an "early exit" hatch so that runtime considers any signatures invalid if it had encountered any invalid signatures before

otherwise node can be forced to do extra work without anybody paying, which is another exploit

burdges commented 4 years ago

I'd think an early exit hatch cannot trigger non-determinism because each batch must come from the same deterministic source, no?

We consider a block invalid if it contains any invalid signature, right? I'm kinda surprised even non-batch signature verification routines ever return anything, like maybe failures could simply kill the block's runtime or something, but runtimes have some cleanup I guess.

We obviously cannot consider signatures from multiple blocks or other sources inside the same batch because our batch optimizations assumed someone else previously examined exactly that batch.

bkchr commented 4 years ago

I'd think an early exit hatch cannot trigger non-determinism because each batch must come from the same deterministic source, no?

Exactly what I said.

We consider a block invalid if it contains any invalid signature, right? I'm kinda surprised even non-batch signature verification routines ever return anything, like maybe failures could simply kill the block's runtime or something, but runtimes have some cleanup I guess.

Because you can use signature verification also in other contexts, not only transaction signature verification.

We obviously cannot consider signatures from multiple blocks or other sources inside the same batch because our batch optimizations assumed someone else previously examined exactly that batch.

Yes

I think we can close this, because there is no leaking of non-determinism.

rphmeier commented 4 years ago

It seems like the API leaks non-determinism but the expected usage does not, based on my reading.

We could appropriate unsafe for this, although technically you're not supposed to use that for anything but memory-safety. However, unsafe trait might be fitting, since it allows the user to impose general requirements on implementers of the trait. We could force the handler of the batch verification result to be an unsafe trait implementation, where the expected behavior of the handler is to abort on any failures.

And then provide a default implementation of the trait for convenience and maybe wrap in a safer API.

@NikVolf What do you think about that proposal? Users would have to manually implement an unsafe Trait wrongly to leak the non-determinism