learning-at-home / hivemind

Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.
MIT License
1.99k stars 157 forks source link

Discussion: p2p security #93

Open justheuristic opened 4 years ago

justheuristic commented 4 years ago

Right now, hivemind works with default tcp protocol with no security. If we are to run on a global scheme, we need to:

1. make sure peers do not risk their personal data by running hivemind nodes This is the boring, but necessary one: we need to make sure we use up-to-date security protocols in both hivemind.dht and hivemind.client/server interaction

2. figure out how to make hivemind model resistant to malicious peers This part is more tricky, there are several attack vectors: 2a. send NaN or wrong gradients to the expert in order to jeopardize its parameters 2b. overwrite existing keys in DHT with wrong information to prevent other peers from finding experts

There are intuitive ways to resist both of these vectors by e.g. dropping outlier gradients in 2a, but security usually works better with expertise, not intuition.

We would appreciate any advice on potential attack vectors and ways to mitigate them.

borzunov commented 3 years ago

We have outlined a draft with a number of attack vectors for hivemind and possible defense strategies. This list is not guaranteed to be complete.

1. Remote Code Execution vulnerabilities

These are arguably the most critical vulnerabilities, since an attacker is able not only to spoil an ongoing deep learning experiment, but to take control of participant's computers (e.g. turn them into botnet nodes). This makes participation in an open experiment unsafe.

a. Using unsafe serializers (#155)

Deserializers like pickle and everything that built on top of it (like torch.load) can load and execute arbitrary attacker's code (example).

Possible solution:

2. DHT vulnerabilities

a. Attacks on DHT operation

Attacker's nodes may send fake contents, omit writings, etc.

Possible solution:

b. Saving records with fake user ID

An attacker may rewrite DHT records corresponding to other node user IDs (e.g. to ruin their statistics or results).

Possible solution:

c. Saving records with invalid expiration time

For example, an attacker may write immortal records that will constantly ruin the experiment operation.

Possible solution:

d. Registering many fake user IDs

An attacker may interfere with the DHT operation by pretending that it has many nodes forming a majority.

Possible solution:

3. Denial-of-Service attacks

DoS attacks become possible if a small request to a service causes expensive computations or generation of a big response. In hivemind, this may be the case for the server and the trainer.

a. DoS of hivemind server

The forward pass and backward pass requests may cause expensive computations.

Possible solution:

b. DoS of hivemind trainer

TODO

4. Spoiling the network weights

a. Sending nan/inf/etc.

Possible solution:

b. Spoiling gradient checkpoints

In hivemind server,

TODO

justheuristic commented 3 years ago

Note on DoS: in hivemind.dht.protocol.DHTProtocol, there's an rpc_find where the output size is roughly 10x larger than input size. Is this alright or can it also cause DoS?

borzunov commented 3 years ago

Reading that seems useful: DHT Security Survey (Urdaneta et al.)