learning-at-home / hivemind

Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.
MIT License
2k stars 157 forks source link

Authorization protocol for a moderated Hivemind network #253

Open borzunov opened 3 years ago

borzunov commented 3 years ago

We consider a moderated decentralized deep learning experiment. To join, a new participant is required to register on a website providing a login, a password, and information about themself. Moderators can review the information and authorize the participant to join the Hivemind network. Later, if they observe malicious actions from the participant, they can withdraw their authorization and roll back the model and the DHT state (if needed).

This issue proposes an authorization protocol for Hivemind peers to interact with each other without the constant need to confirm another peer's authorization with a central server. The protocol is intended to be secure against eavesdropping and man-in-the-middle attacks.

1. Introduction

We assume to have of a central authorization server. It owns a key pair (auth_server_public_key, auth_server_private_key). Peers have a secure communication channel with this server (e.g. the server use HTTPS or TLS).

Also, we assume each peer to have their own key pair (peer_public_key, peer_private_key). We can't assume that peers have secure communication channels between each other (like TLS) since it is impractical for each peer to get the approval of a certificate authority.

Every time a peer provides its endpoint on the network (e.g. when adding itself to the DHT routing table), it should also provide its peer_public_key. (implementation detail: with libp2p, the endpoint and the public key may be united to one entity)

2. Joining the network

  1. A peer provides its login, password, and peer_public_key to the authorization server.
  2. The server checks that:
    • An account with these login and password has been registered, approved by the moderators, and not banned after that.
    • peer_public_key was not used by somebody else.
  3. If everything is okay, the server decides to grant the peer access to the network until expiration_time (e.g. for 6 hours) and responds to the peer with:
    • hivemind_access_token - the tuple (peer_public_key, expiration_time) signed with auth_server_private_key. For other peers, this token will mean that the peer with peer_public_key has been granted access to the Hivemind network until expiration_time.
    • auth_server_public_key - the server's public key. This is to allow the peer to check the access tokens of other peers.
    • bootstrap_nodes - a list of tuples (endpoint, public_key) for peers who recently joined the network. This is to tell the peer where it can start joining the network (e.g. see how Bitcoin nodes initially find peers).
  4. After the expiration time, the peer may repeat this procedure to get a new token.

During the joining procedure, HTTPS/TLS protect us from eavesdropping and man-in-the-middle.

3. Making requests

If Alice wants to make a request to Bob, the request content is extended with a structure with the following fields:

Bob should store all request_ids for request accepted in the last N seconds. Also, Bob should reject all requests where:

  1. alice_access_token is invalid or expired
  2. signature is invalid w.r.t. alice_public_key
  3. nonce is the same as one of the nonces he stores
  4. current_time differs from his time by more than N seconds
  5. bob_public_key differs from his actual public key

This way, Bob ensures that:

Note that an eavesdropped request can't be replayed to anybody, and a man-in-the-middle attacker can't replace the request content.

4. Responding to requests

When Bob responds to Alice, the response content is extended with a structure with the following fields:

Alice should reject the response if:

  1. bob_access_token is invalid or expired
  2. bob_public_key inside bob_access_token matches the expected bob_public_key
  3. nonce doesn't match the request nonce
  4. signature is invalid w.r.t. bob_public_key

This way, Alice ensures that:

Note that an eavesdropped response can't be replayed for another request, and a man-in-the-middle attacker can't replace the response content.

5. Banning peers faster (optional)

The authorization server may also issue messages ('ban', peer_public_key) signed with the auth_server_private_key. Such messages may be propagated by the peers, and the peers can stop interacting with the banned peers.

borzunov commented 3 years ago

The protocol is partially implemented in #255 to protect the DHT service. This partially protects other services (e.g. averager) since peer endpoints for these services are fetched from the DHT.

The next task is to implement the protocol for the averager and other services. This would protect from replay and MITM attacks on these services performed with fake stateful requests.

justheuristic commented 2 years ago

@borzunov can we consider this done?

borzunov commented 2 years ago

Not completely, see the comment above (some technically harder attacks involving fake stateful requests to averager are still possible). Previously, the task was blocked by implementing the averager over libp2p. Now, I see two options:

  1. Sign all averager messages as described in the protocol above (this involves signing only a hash of the whole message, so the performance overhead may turn out negligible).
  2. If the option 1 still turns out to be slow, come up with a solution involving some group secret tokens stored in a DHT.