jbenet / random-ideas

random ideas
juan.benet.ai
324 stars 12 forks source link

Compact proofs of retrievability in DSHTs #10

Open jbenet opened 10 years ago

jbenet commented 10 years ago

Use compact proofs of retrievability to validate value advertisements in DSHTs

Motivation

DSHTs (and other content delivery networks) are used to index mappings from a key to a set of nodes who can serve the value. Meaning, instead of storing { key : value } pairs -- which can get very costly on the DSHT if value is large -- they store { key : [ nodes_who_have_value ] }. The DSHT's client queries for a key, receives a set of nodes (id and network addresses) and can request the value from those nodes directly, usually over some other protocol (e.g. CFS, BitTorrent, HTTP).

Such setups have a significant vulnerability: malicious nodes may advertise possession of a particular (key, value) pair, be listed in the DSHT, and deny all requests from users. This is problematic because malicious nodes can cause honest nodes to direct other honest nodes to malicious nodes.

This attack is somewhat mitigated by: (a) long-running presence of honest nodes (kademlia), (b) costly sybill generation (s/kademlia), and (c) share ratio trackers (bittorrent).

Setup

I propose including a compact proof of retrievability along with value possession advertisements. This would add slight overhead to every key stored in the DSTH, but would allow both DSHT relays and clients to validate a (key, node, proof) tuple before serving or using it.

The proof should be computed based on the individual node (e.g. taking the node_id as a seed into the challenge). And, ideally, new per re-advertisement (kademlia expires mappings, having nodes re-advertise to persist them).

Additionally, every would-be hoster should be able to respond to online PoR request. Meaning, add the following RPC:

valueProof(key) -> compact proof of retrievabiliy

Overhead

// from storing
{ key: [ (node_id, node_addr) ] }

// to storing
{ key: [ (node_id, node_addr, proof) ] }`
{ key + '_proof_pk': proof_pk }

A compact PoR proof is usually around 10-20 bytes. This cost is per-advertisement. For comparison, node ids are usually 20-32 bytes, addresses are 6-18 bytes ({udp,tcp}/IPv{4,6}). Also requires storing a per-key PoR public key (proof_pk), witch which challenges are computed/proofs verified.