paritytech / polkadot-sdk

The Parity Polkadot Blockchain SDK
https://polkadot.com/
1.88k stars 688 forks source link

Sync: security review #569

Open arkpar opened 5 years ago

arkpar commented 5 years ago

Syncing code should be reviewed for any potential

gterzian commented 5 years ago

We might want to use bounded channels where we currently use unbounded. There are currently two sets of channels used:

  1. A channel from network-libp2p to sync. The protocol_sender used here https://github.com/paritytech/substrate/blob/a6ca090c3e2e44b24f34319648f66b64575369e0/core/network/src/service.rs#L540

  2. A channel from sync/protocol to the import-queue, used when import-blocks or import-justification is called on the ImportQueue trait.

https://github.com/paritytech/substrate/blob/a6ca090c3e2e44b24f34319648f66b64575369e0/core/consensus/common/src/import_queue.rs#L125

Then the import-queue actually uses the same protocol channel as libp2p to communicate import results back to sync via the NetworkLink. https://github.com/paritytech/substrate/blob/a6ca090c3e2e44b24f34319648f66b64575369e0/core/network/src/service.rs#L76

@tomaka would it be acceptable to block the networking runtime when protocol is slow to handle a message sent to it? This would happen if we were to use a bounded channel, which would block on send until the message is received by sync/protocol.

If acceptable, we could bound the channel to sync, and the channel from sync to the import-queue, making network-libp2p block until message are handled by sync, and making sync block when sending an import message to the import-queue until that message has been handled(not until the import has finished, just until the message is received). Currently import-requests would basically queue up in the buffer of the channel.

Then I would suggest we introduce a dedicated import-queue -> sync channel to communicate import results back, and make that unbounded. That way the "input" in the system is bounded, but not the "output" from the importing. Otherwise the import-queue and sync could end-up deadlocking when both are waiting for the other to receive a message...

gterzian commented 5 years ago

I would also like to point to potential timing based attacks Loophole: Timing Attacks on Shared Event Loops in Chrome.

For example currently, when a block import fails, we immediatly go into "error" mode at https://github.com/paritytech/substrate/blob/a6ca090c3e2e44b24f34319648f66b64575369e0/core/consensus/common/src/import_queue.rs#L501

This is an example of a potential timing loophole, since a batch of blocks containing an error will "finish" more quickly than if we were to actually import all of them. A potential solution is to make import_a_batch_of_blocks always take a constant amount of time, regardless of whether there was an "early exit".

bkchr commented 5 years ago

@gterzian The block is public anyway, why would you want to attack to reveal the content of a block? These constant timing stuff is mostly used to prevent leakage of information. If we always take constant time to check a block, we are probably open an attack vector. Regarding networking stuff, networking should support back-pressure and thus using a bounded channel should be no problem.

tomaka commented 5 years ago

Since we ask only one single node at a time, if that node takes a long time to respond on purpose, we might not get the block data in time.

We should modify the code to start asking other nodes over time if the response doesn't come.