gpestana / notes

notes, ideas and whatnot
https://gpestana.com
41 stars 6 forks source link

RFC [privacy preserving DHT] Sinkhole protocol #27

Open gpestana opened 4 years ago

gpestana commented 4 years ago

Abstract

This RFC defines a protocol for improving the privacy of distributed hash tables (DHT) lookups. It uses a Private Information Retrieval (PIR) database to hide the DHT key being looked up. The sinkhole protocol splits the traditional DHT lookup into two phases: First, the lookup initiator searches for PIR providers for a given key in the DHT; Then, the lookup initiator completes the key lookup by querying the PIR provider for the full ID. Only the first phase of the protocol discloses to the DHT nodes information about the ID of the lookup. However, the first phase of the lookup only discloses part of the ID key.

Status of the RFC

Internet-Draft (as per IETF RFC)

Introduction

A Distributed Hash Table (DHT) is a protocol that defines a routing protocol to store and retrieve key-value pairs from an overlay network. The protocol defines an API for peers in the network to store a value data under a deterministic key (the data's id). It also defines how to retrieve a key get(key) from the network. A value stored in the network has a unique key based on the hash of the value's data. For example, in the case of Mainline DHT [1], a value v has the key k = sha1(v), were sha1() is the hash function that takes the input and produces a 160-bit (20-byte) hash value. Using this mechanism, each peer in the network agree on every value's key without interaction.

A DHT also defines which network peers are responsible to store key-value pairs. In addition, it defines the routing protocol used to 1) distribute new key-value pairs to network peers that should store it and 2) route a DHT lookup so that key-pairs are resolved.

We briefly explain how network peers define where to route store() and get() requests in the network, i.e. which peers in the network are likely to be responsible to store a particular key-pair.

While each resource added to the network has a key based on the hash of its content, each peer in the network is also assigned with a unique identifier ID in the same domain space. The peer's ID can, for example, be the hash of its public-key ID = sha1(peer_pk). Since the resources' key and peers' ID share the same domain, a DHT protocol defines that a resource with key should be stored by the peers in the network with an ID "sufficiently close" to key. The last piece that brings it all together defines how does a peer learns which other peers in the network are "sufficiently close" to a given key - the routing protocol. The routing protocol is used to store a new key-pair in the network and to attempt to retrieve a key-pair from the network. In short, a lookup initiator - i.e. a peer looking for the owner or a particular resource key in the network - maintains a routing table locally with dialling information about a subset of the peers in the network (the logic for constructing and maintaining the routing table is out of the scope of this document and it changes between protocol flavors). An example of a routing protocol is to pick the N peers in from the routing table which ID is closest to the key resource key and then perform a network request to each in parallel for the key. If the peers receiving the request store a key-pair with the same key requested, then the pair is returned. Otherwise, the peers will check their routing tables and send back to the lookup initiator a subset of the closest peers they know to the requested key. The lookup initiator can then pick from the new peers dialing information which ones are closest to key and perform the same operation until the resource with key is found or no closest peers are returned back.

The collaborative nature of the DHT routing protocol allows peers to orchestrate key lookups in an arbitrarily large network, with practical scalability, and without central coordination. However, from a privacy perspective, the protocol leaks information about which DHT keys (and, thus, resources) a lookup initiator is interested on. It is easy to think how information leaks on the routing protocol of DHTs bubble up to the application layer: Imagine a decentralised newspaper relies on a DHT to keep a directory of the peers in the network that store a each news article (a common decentralised storage pattern). Every time a user tries to read a particular news article, she will disclose information about her interest to many peers in the network during the DHT lookup phase.

Computational Private Information Systems

explain what cPIR are, reference papers (namely SealPIR) and how practical they are nowadays

The Sinkhole Protocol

The Sinkhole protocol defines how to use a cPIR system to enhance privacy of DHT lookups. The birds' eye view of the protocol is the following: instead of DHT lookup initiator requesting the network for a key - which would disclose her interest on key to network peers -, she uses M out of N bits of the key to lookup in the DHT for PIR providers that may store key. If a PIR provider is found using the DHT lookup, the lookup initiator performs the privacy preserving query to the cPIR provider with the full key. Following this protocol, and assuming a cPIR provider is storing key, a user can resolve it by disclosing only M bits to the network.

Description

We assume the existence of a set of providers {P0, P1, ..., Pn}, each running a computational PIR database. Each Pn provides an interface for queries to key-value pairs with the same properties as DHT key-value pairs. We assume that the API endpoint for querying Pn database is available at /ip4/<Pn_IP>/tcp/8080

A provider P0 decides to provide private discovery to a key-value pair that is stored in the DHT. Let's assume that the key, k, is f32b67c7e26342af42efabc674d441dca0a281c5 (generated using sha1) and the value is /ip4/224.123.664.11/udp/2200.

1) cPIR provider resolves and stores key-value pairs

First, P0 makes a DHT request to resolve k's value.

DHT.get(`f32b67c7e26342af42efabc674d441dca0a281c5`)

and stores the result (/ip4/224.123.664.11/udp/220) in the cPIR database. The database is a hashmap and stores the result under the index k.

At this point, a user may query P0 for resolving k privately. However, the user does not know about the fact that P0 is providing k. Thus, we need to design a protocol for users to discover potential cPIR providers for the keys they are interested about.

2) cPIR provider registers as storing a specific key

P0 re-hashes the key to generate a interval_key using the following algorithm:

mask = 1111111111111111111100000000000000000000
masked_key = f32b67c7e26342af42efabc674d441dca0a281c5 AND mask

interval_key = sha1(masked_key)

P0 stores its API dialing information for querying the cPIR endpoint (/ip4/<P0_IP>/tcp/8080) under reg_key in the DHT by making the following DHT request:


DHT.store(interval_key, /ip4/<P0_IP>/tcp/8080)

In practice, P0 created a rendezvous point for all keys between f32b67c7e26342af42ef00000000000000000000 and f32b67c7e26342af42efffffffffffffffffffff and registered itself as provider of at least 1 key between that interval.

3) User looks for cPIR provider in DHT

When user U wants to query key, she first generates the interval_key using the same protocol as the cPIR provider:

mask = 1111111111111111111100000000000000000000
masked_key = f32b67c7e26342af42efabc674d441dca0a281c5 AND mask

interval_key = sha1(masked_key)

Then U makes a DHT request find if there is any potential cPIR provider for the key.

DHT.get(interval_key)

The DHT lookup will return /ip4/<P0_IP>/tcp/8080, which is the cPIR provider dialing endpoint.

Notice that U did not use the full key f32b67c7e26342af42efabc674d441dca0a281c5 in the DHT lookup in order to conceal her interest in that particular key. Thus, network peers will only be able to deduce that user U is intersting in 1 or more keys in the range between f32b67c7e26342af42ef00000000000000000000 and f32b67c7e26342af42efffffffffffffffffffff.

4) User queries cPIR privately to resolve key

Now that U knows of a cPIR provider that may have the key she's looking for, she builds an encrypted query Enc(key)using key f32b67c7e26342af42efabc674d441dca0a281c5 and performs the following RPC call to /ip4/<P0_IP>/tcp/80801:

pir.query(Enc(key))

which returns Enc(result). The result of the decryption using U's private key of the returned RPC value is either 0 or the value stored under key in the cPIR database. In this case, the result will be /ip4/224.123.664.11/udp/2200

Performance and Practical Considerations

References

[1] Mainline DHT