Optimistic Provide to replace legacy provide operation

This issue describes an alternative provide process making use of Optimistic Provide and mitigating CID eclipsing attacks.

Current Provide Operation

The current IPFS/libp2p provide operation (as described in the spec) consists in finding the K (K has multiple meanings, and K=20 in most IPFS implementations) closest peer in XOR distance to the provided content, or in other words the K peers whose Kademlia's identifier is the closest to the CID's Kademlia identifier. Once these K peers are identified, the content provider sends them a ADD_PROVIDER RPC, asking them to record that the provider is providing CID.

Some problems of the current provide operation

It is harder than it seems to implement correctly. One of the reasons, is that among the 20 closest peers, some of them will be unreachable/offline. The GET_PROVIDERS RPC, used to find the 20 closest peers to a Kademlia key preimage (e.g a peer.ID or cid.Cid) can only return the 20 closest peers to a key. There are no guarantee that it will discover the 25 closest peers. And if the 20 closest peers contain unreachable nodes, then the provider record can not be allocated to 20 peers, or at least not to the 20 closest peers that are reachable.

Moreover, it will be inefficient to wait for the lookup query to be over before sending the ADD_PROVIDER RPC. The lookup process may take a very long time if we need to wait for the timeout of unreachable peers. Hence, a user needs to wait for seconds before the provider record is actually sent to DHT servers.

Improvement suggestion

Given a network size estimator (like the one included in Optimistic Provide), we can compute the average distance from any Kademlia key, in which there will be on average K peers, namely there are K peers located in the range from key to key+dist in the XOR keyspace. If the network size estimator can exclude the unreachable peers, or the percentage of unreachable peers in the network is known, we can determine dist such that for all Kademlia keys there are on average K reachable peers in a range of length dist. dist is computed as the maximal XOR distance ($2^{256}$ in IPFS/libp2p) multiplied by K divided by the number of (reachable) peers in the network.

The query process to find the closest peers to the provided Kademlia key (defined as hash(cid) in IPFS), should discover all peers between key and key+dist, even if it means looking up multiple Kademlia keys (each lookup can return AT MOST 20 reachable peers). A method to discover all peers within a given Kademlia range is described here.

As soon as a peer within dist of key answers to a lookup, we can immediately send it a ADD_PROVIDER RPC.

This technique is also an elementary mitigation against CID eclipse attacks. If an attacker creates 20 nodes located around a victim key in the keyspace, the provider of key will allocate on average 20+20=40 provider records, because it doesn't care about the absolute 20 closest peers, but rather about the distance to the key. So there should always be honest peers storing provider records, it may be harder for the requester to find them during an attack though.

However, the distribution of the peers in the keyspace isn't exactly uniform, and there may be more or less than K peers within the dist from a key. It is thus important to have new peer ids as uniformly distributed as possible, as described in the Power of two choices. In most cases it is not a problem if there are 18 or 22 provider record replicas instead of 20.

This allows provider records to be available earlier to the DHT, and improves security against CID eclipsing attacks.

Lookup termination

Lookup termination when looking for the K closest peers to a key, or when looking for a key that isn't found immediately is also challenging. We do not want to query all peers that we discover during the lookup, because some of them will be very far away from the key and will not provide useful answers. However, there are some peers that we should query, that are not discovered by the lookup, that would provide useful results.

A good candidate for lookup termination would be to fully explore the keyspace between key and key+dist. This might require looking up for multiple keys, and can be achieves as described here. Once this part of the keyspace has been fully explored, and if the provider records hasn't been found, the lookup can be cancelled, there is only a negligible probability that the provider record is available outside of this zone.

This lookup termination mechanism is simple and well defined.

Provide Interface

It would be great in the context of a multi thread application (e.g kubo/boxo) to have a Provide(cid.Cid) chan peer.ID function. As provider records are allocated to remote peers, the Provide function write the peer.ID of the DHT servers storing the provider record on the channel. The channel is closed when the provide operation is over. This allows the caller to know how many (and which) peers are storing the provider record so far, and doesn't need to wait for the provide operation to be finished before returning.

How about `IPNS`?

We could use the same technique for the PUT RPC used by IPNS.

References

Optimistic Provide: https://github.com/libp2p/go-libp2p-kad-dht/pull/783
Finding all peers within a XOR distance range: https://pl-strflt.notion.site/N-closest-peers-83038476d00f4af1bfb40aeeefec39f5
Power of two choices: https://www.youtube.com/watch?v=0dKz_K-7eoI

probe-lab / go-kademlia