microsoft / CCF

Confidential Consortium Framework
https://microsoft.github.io/CCF/
Apache License 2.0
784 stars 211 forks source link

A malicious host could cause a denial of service by manipulating tick() #99

Open achamayou opened 5 years ago

achamayou commented 5 years ago

Reported by @dantengsky in #86

Suppose a service composed of 3 nodes {n0 .. n2} , all nodes are synced(same term, index) at beginning.

Adversary controls a minority {n0}. (Enclaves are not compromised)

n0 is in Follower state, adversary may modify the code of untrusted-zone, so that AdminMessage::tick messages are sent to enclave much more frequently, and with large enough elapsed_ms value to trigger timeouts.

If I get it right, the victim enclave will keep sending RequestVote messages to peers, and because messages are constructed by the enclave, other peers will treat the RequestVote messages as legitimate, the honest leader will also transit to Follower state.

The adversary also drops in-bound messages to the victim enclave, so that victim enclave can not transit to Leader state, hence no AppendEntries messages will be sent.

The malicious node keeps being the first RequestVote message's sender for each new term, the cluster will be effectively shutdown.

Network is still partially synchronous, a majority is still alive, but liveness no longer held.

The most straightforward fix for this is to execute the random election timeout inside the enclave, to make sure it isn't shorter than a lower bound.

eddyashton commented 4 years ago

We've noticed a similar issue in one of our tests. We suspend the leader node for some time, to force an election. The other node choose a new leader and happily make progress, but when the original leader is unsuspended it gets an unusually large tick (covering the entire span of its suspension time), and this triggers an election. We explored some mitigations for this, but fundamentally it falls into the same category; Raft requires regular, accurate time updates from the host, and without these it is possible to trigger spurious elections.

The only fix is some form of trusted time within the enclave, perhaps from node-gossip channels or perhaps from spinning to spend time within the enclave, but we have no firm plan for this yet.

achamayou commented 3 years ago

We think implementation of the PreVote extension to Raft (https://web.stanford.edu/~ouster/cgi-bin/papers/OngaroPhD.pdf 9.6, ticketed in #2577 ) will mitigate this problem without requiring an expensive busy-wait.