Discussion: p2p security

justheuristic commented 4 years ago

Right now, hivemind works with default tcp protocol with no security. If we are to run on a global scheme, we need to:

1. make sure peers do not risk their personal data by running hivemind nodes This is the boring, but necessary one: we need to make sure we use up-to-date security protocols in both hivemind.dht and hivemind.client/server interaction

2. figure out how to make hivemind model resistant to malicious peers This part is more tricky, there are several attack vectors: 2a. send NaN or wrong gradients to the expert in order to jeopardize its parameters 2b. overwrite existing keys in DHT with wrong information to prevent other peers from finding experts

There are intuitive ways to resist both of these vectors by e.g. dropping outlier gradients in 2a, but security usually works better with expertise, not intuition.

We would appreciate any advice on potential attack vectors and ways to mitigate them.

borzunov commented 3 years ago

We have outlined a draft with a number of attack vectors for hivemind and possible defense strategies. This list is not guaranteed to be complete.

1. Remote Code Execution vulnerabilities

These are arguably the most critical vulnerabilities, since an attacker is able not only to spoil an ongoing deep learning experiment, but to take control of participant's computers (e.g. turn them into botnet nodes). This makes participation in an open experiment unsafe.

a. Using unsafe serializers (#155)

Deserializers like pickle and everything that built on top of it (like torch.load) can load and execute arbitrary attacker's code (example).

Possible solution:

Use restricted and safe serialization protocols like MessagePack.

2. DHT vulnerabilities

a. Attacks on DHT operation

Attacker's nodes may send fake contents, omit writings, etc.

Possible solution:

Since DHT is a well-known concept used in various open p2p protocols (like BitTorrent), it is likely that there are standard techniques to make DHT more robust to such attacks (e.g. detect and ban malicious nodes), that we need to study and implement. For example, see a DHT Security Survey (Urdaneta et al.).

b. Saving records with fake user ID

An attacker may rewrite DHT records corresponding to other node user IDs (e.g. to ruin their statistics or results).

Possible solution:

We can turn the user IDs to RSA open keys, then require users to sign any DHT request that includes a user ID with their closed key, so an attacker won't be able to fake the ID.
To protect from an attacker replaying previous requests of another node, we should ensure that all writes with the outdated or equal expiration time are ignored.

c. Saving records with invalid expiration time

For example, an attacker may write immortal records that will constantly ruin the experiment operation.

Possible solution:

Require nodes to have correct UTC time.
Decline writes where the expiration time is too large.

d. Registering many fake user IDs

An attacker may interfere with the DHT operation by pretending that it has many nodes forming a majority.

Possible solution:

As a first step, we can create a centralized interface to register new nodes.
To make a system decentralized, we may limit a number of IDs coming from one IP address, require some proof of a useful contribution to the calculations to create new IDs, and ban nodes known to be malicious.

3. Denial-of-Service attacks

DoS attacks become possible if a small request to a service causes expensive computations or generation of a big response. In hivemind, this may be the case for the server and the trainer.

a. DoS of hivemind server

The forward pass and backward pass requests may cause expensive computations.

Possible solution:

Each server may count the number of requests received from each IP/user IDs for some period of time.
When requests form a queue, the server may select requests from IPs/IDs that called this server most rarely, so frequent users are penalized.
The callers may be programmed to use the most diverse set of servers for each computation.

b. DoS of hivemind trainer

TODO

4. Spoiling the network weights

a. Sending nan/inf/etc.

Possible solution:

Nodes should check all tensors for invalid or out-of-range values.

b. Spoiling gradient checkpoints

In hivemind server,

TODO

justheuristic commented 3 years ago

Note on DoS: in hivemind.dht.protocol.DHTProtocol, there's an rpc_find where the output size is roughly 10x larger than input size. Is this alright or can it also cause DoS?

borzunov commented 3 years ago

Reading that seems useful: DHT Security Survey (Urdaneta et al.)

learning-at-home / hivemind