protocol / beyond-bitswap

Other
34 stars 9 forks source link

RFC|BB|L2: Speed up Bitswap with GraphSync CID only query #25

Open hannahhoward opened 3 years ago

hannahhoward commented 3 years ago

Abstract

This RFC proposes to add a protocol extension to GraphSync that would cause a GraphSync responder to send only metadata in a GraphSync response -- but no blocks. We would use the list of CIDs contained in the metadata to fetch all the blocks in a DAG with Bitswap in nearly in paralel.

Shortcomings

When fetching a DAG with Bitswap, we must first request and receive blocks at level N of the DAG in order to request blocks at level N+1. This slows a DAG traversal significantly because we're can't parallelize requests for the entire DAG.

GraphSync is attempts to solve this problem by requesting a whole DAG from another peer, expressed through a selector. However, requesting at the DAG level makes spreading requests across peers more difficult -- splitting a DAG at a selector level is significantly more complex than simply splitting up block block requests

Description

The idea here is to essentially use GraphSync to assemble a lightweight manifest for arbitrary DAG data. We use GraphSync to get a list of CIDs that are in the DAG we want, and then we use Bitswap to get the CIDs. In theory, this might offer a "best of both worlds" type solution to fetching large DAGs that multiple peers have possession of. Moreover, it requires minimal protocol changes -- and implementing the protocol extension in go-graphsync on the responder side is very trivial. Moreover, storing a list of CIDs in a DAG is sufficiently cheap that GraphSync peers might begin caching frequently requested CID lists. The system might also pair nicely with RFCBBL1201 - we could make a "WANT" request for a DAG to determine who's got it, then a CID only query with GraphSync, then fetch CIDs with Bitswap

Implementation

Similarities and Differences With Request Default GraphSync Requestor Implementation

The approach here is shares several similarities with the normal operation of a GraphSync Requestor:

The difference is:

Challenges

Evaluation Plan

Impact

aschmahmann commented 3 years ago

@hannahhoward I think this is a great direction to explore! A couple thought/extensions here:

One factor we might use to "estimate" how far out we can safely go is agreement among multiple remote peers about what is contained in the CID list for a selector query

An alternative approach is to build up trust in a manifest over time

This proposal also has a bit of an issue whereby an unverified manifest can force us to download nodes from an unrelated graph in a delegated DoS attack (i.e. malicious peers convince honest nodes to download bits from a target thereby wasting both party's bandwidths at minimal expense to the attackers). This can be mitigated by asking that peer for a manifest before asking it for the associated unverified nodes.

MarcoPolo commented 2 years ago

Another extension that would solve the trust issue and let us parallelize safely:

What if we could prove that the manifest returned by the cid+selector for the graphsync/do-no-send-blocks was correct? I think this is possible (but a bit tricky) with a snark. Here's how I imagine it could work:

  1. A client asks a provider for all the cids/links traversed for a cid+selector.
  2. The provider returns a list of cids along with a small proof that they contain some block with the given cid and when traversed with the selector returns the given list of cids.
  3. The client can then ask the network in parallel for all those cids since it knows for sure that they are related to the cid+selector. No trust required.
  4. These cidlist+proofs can be cached for a given cid+selector, and we can even store the caches on a separate set of machines since we know they are correct.

In practice right now it's a little trickier I think (we may not be able to implement ipld traversal in a snark circuit for example). But I bet something simpler along the same idea could work today.