ipfs / notes

IPFS Collaborative Notebook for Research
MIT License
401 stars 30 forks source link

DHT 2.0 #291

Open Stebalien opened 6 years ago

Stebalien commented 6 years ago

So, we've accumulated quite a few "if only"s in the DHT. We should start thinking about what we'd like in a DHT 2.0.

fahrradflucht commented 6 years ago

Related to but not a duplicate of https://github.com/ipfs/notes/issues/162

daviddias commented 6 years ago

While at it, let's make sure to consider how to make it private.

Stebalien commented 6 years ago

While at it, let's make sure to consider how to make it private.

@diasdavid, @jbenet and I discussed this yesterday and he had several really nice ideas.

The core idea is that, instead of calling Put(key, value), we can call Put(Hash(key), Encrypt(DeriveSymmetricKey(key), value)). This obviously doesn't work for records with one value like IPNS records but works really well for both provider records and PeerInfo records. This obviously assumes that key has sufficient randomness.

He also proposed an extension where the record author can put an additional "challenge" key that the record author can prove knowledge of to fetch the record. However, that may not be that useful given that DHTs are supposed to allow neighbors to fetch keys from each other on join.

I also proposed an additional privacy enhancing extension where one can use Hash(key || date) so that provider records get moved around. Juan extended this with Hash(key || date || counter) allowing users to post multiple records in different areas of the DHT (helps both with privacy and, possibly, reliability).

This should make the network significantly harder to enumerate.


We can also use these features to get basic capabilities.

  1. It allows "hidden" peers where you need to know the peer's ID to find them. See: https://github.com/libp2p/libp2p/issues/37
  2. With some modifications to bitswap, it could avoid leaking information about what a peer is looking for to the peers from which it is requesting blocks.
daviddias commented 6 years ago

//cc @mafintosh here who has been thinking a lot on how to make DHTs private and is currently building one (or something to add this functionality).

Stebalien commented 6 years ago

Additional notes:

DoS

Because these records are encrypted, it will be impossible for DHT nodes to filter out "bad" peer routing records (or content routing records).

Solution: Allow peers to explicitly expose the unhashed key so that DHTs can perform additional validation.

DHT Delete

More generally, we've found that we'd like a DELETE function for the DHT.

  1. We could use it for the DoS issue above to clear out bad records without exposing the key. This would, unfortunately, require some additional crypto magic (unless we're fine exposing the key).
  2. We'd also like this to implement an "inbox" protocol. When peer A is offline, peer B could post an encrypted message for peer A to the DHT under /inbox/peerA. When peer A comes back online, it would download and then clear all messages on the DHT (not currently possible).
Stebalien commented 6 years ago

One issue that has been plaguing us in the DHT is adding new datatypes. Our plan is to fix this with IPRS and arbitrary WASM functions however, at the moment, this has some pretty significant drawbacks:

  1. Security. The VM will likely be really complex.
  2. Simplicity. Any DHT will now need access to a WASM VM. This will become less of an issue over time as WASM reaches greater adoption but is a major problem in the short term.
  3. Performance. At the moment, doing this would require spinning up a new WASM VM each time we want to validate something. Worse, we'd need a VM that can quickly verify signatures.

When thinking about https://github.com/libp2p/go-libp2p-kad-dht/issues/189#issuecomment-414490287, I realized that there's a way to significantly improve the current situation without necessarily adding full IPRS: We can use one protocol per record type.

That is, we can have a "separate" DHT for each key type. At the end of the day, this won't really cost us anything, we're just creating multiple overlay networks. To add a new key type, we'd:

  1. Register a new "sub" DHT with a description of acceptable values/keys (validators/selectors).
  2. This new "sub" DHT would speak a new protocol (e.g., /p2p/dht/MyTypeName/1.0.0).

This would allow users to:

  1. Choose which DHTs they want to participate in.
  2. Add new "types" without convincing the entire network to support their type.

This should add almost no overhead as the nodes will keep the same IDs in all the DHTs (so the routing tables will be almost identical).

Note: This'll also allow applications to register external DHTs using the HTTP API (kind of like the ipfs p2p feature).


The difficult issue here will be finding these "sub" DHTs. That is, finding peers participating in them. This is basically the rendezvous/discovery issue but the tricky part is that these "sub" DHTs may be massive (the size of the DHT) to tiny (a few nodes).

florianlenz commented 6 years ago

@Stebalien for the messaging system quasar might is of interest for the DHT. It's very effective in the terms of routing, give it a quick read if you have the time. The only problem I see right now with quasar is the filter propagation which take a few minutes with the current mechanism (but I guess we can come up with sometime more effective).

sanderpick commented 6 years ago

Sounds pretty slick @Stebalien

Stebalien commented 6 years ago

@florianlenz that looks like a pubsub system. We actually have a separate pubsub implementation and even have a "value store" adapter for pubsub (which we use for IPNS over pubsub).

The difference here is that a DHT stores values (temporarily) so members can join and leave and still get published values.

jhiesey commented 5 years ago

So I finally got around to writing up a proposal for a refactor followed by a new protocol: https://github.com/libp2p/research-dht/pull/8

Thoughts @Stebalien @daviddias @anacrolix

daviddias commented 5 years ago

Ton of DHT notes at https://github.com/gpestana/notes/issues/8 by @gpestana

Stebalien commented 5 years ago

Looks like the TOR project had put a lot of thought into this: https://github.com/torproject/torspec/blob/master/rend-spec-v3.txt