hannahhoward commented 3 years ago

Abstract

This RFC proposes to add a protocol extension to GraphSync that would cause a GraphSync responder to send only metadata in a GraphSync response -- but no blocks. We would use the list of CIDs contained in the metadata to fetch all the blocks in a DAG with Bitswap in nearly in paralel.

Shortcomings

When fetching a DAG with Bitswap, we must first request and receive blocks at level N of the DAG in order to request blocks at level N+1. This slows a DAG traversal significantly because we're can't parallelize requests for the entire DAG.

GraphSync is attempts to solve this problem by requesting a whole DAG from another peer, expressed through a selector. However, requesting at the DAG level makes spreading requests across peers more difficult -- splitting a DAG at a selector level is significantly more complex than simply splitting up block block requests

Description

The idea here is to essentially use GraphSync to assemble a lightweight manifest for arbitrary DAG data. We use GraphSync to get a list of CIDs that are in the DAG we want, and then we use Bitswap to get the CIDs. In theory, this might offer a "best of both worlds" type solution to fetching large DAGs that multiple peers have possession of. Moreover, it requires minimal protocol changes -- and implementing the protocol extension in go-graphsync on the responder side is very trivial. Moreover, storing a list of CIDs in a DAG is sufficiently cheap that GraphSync peers might begin caching frequently requested CID lists. The system might also pair nicely with RFCBBL1201 - we could make a "WANT" request for a DAG to determine who's got it, then a CID only query with GraphSync, then fetch CIDs with Bitswap

Implementation

Assuming we don't have RFCBBL1201, we will start a DAG traversal by sending a WANT request with Bitswap to all of our peers for the root block of the dag we are traversing. (and fall back to the DHT as needed)
Once we have a list of nodes that have the root block, we will send each node a GraphSync request containing a graphsync/do-not-send-blocksextension
Nodes that receive a GraphSync request with a graphsync/do-no-send-blocks extension will perform a selector query, and include metadata in responses but omit all blocks
We now have from each node an UNTRUSTED ordered list of CIDs we expect we will encounter as we traverse the DAG, up to either the leaves, the limits of the selector recursion, or any point where the remote node found itself missing the next block in traversal.
As an initial trust check, we can compare lists. This doesn't offer much protection against a Sybil attack, but in most cases we can generally say it increases our trust somewhat (more later)
We now begin requesting CIDs IN ORDER through Bitswap, as many as needed at once in parellel order to establish a large buffer of unverified blocks
The only way to truly trust any of this data is to execute the selector traversal itself on our machine (this is true in GraphSync as well), and verify blocks from remote peers as they are loaded during selector traversal.
This also means we'll need to quarantine unverified blocks from the main IPFS blockstore

Similarities and Differences With Request Default GraphSync Requestor Implementation

The approach here is shares several similarities with the normal operation of a GraphSync Requestor:

Both send a selector query to a remote peer
Both execute a local selector query backed by remote blocks as a mechanism for verification
Both store unverified remote data in a quarantined store until it becomes verified

The difference is:

Traditionally we send the remote blocks in the GraphSync response itself and begin verifying as soon as we start getting data back from the GraphSync responder
Here we get all of the query results from the GraphSync responder first, but with no blocks, and then load the blocks with Bitswap from potentially several peers.

Challenges

We often want to traverse a whole DAG all the way to its leaves, but remote peers will limit the depth of recursion for GraphSync queries, meaning we may need to make multiple GraphSync queries to get the whole dag. One challenge however is we don't currently know in GraphSync whether CIDs are truly leaves or the point at which selector traversal hit maximum recursion depth. And in this scenario we won't know until we fetch them. This may be an additional extension we need to look at adding to the GraphSync protocol
We don't want to fetch large sets of blocks that turn out to be wrong. At the time, there is an inherent tension between wanting to fetch ahead enough not to be limited by Bitswap's round trip bottleneck, and not wanting to fetch too many blocks that turn out to be wrong. One factor we might use to "estimate" how far out we can safely go is agreement among multiple remote peers about what is contained in the CID list for a selector query

Evaluation Plan

The IPFS File Transfer benchmarks.

Impact

If this moves from a prototype to implementation stage in go-ipfs, we will need significant refactors of go-bitswap, go-graphsync, and the current IPFS DAGService to make this actually work. However, this also inline with our goals for go-ipfs as it will enable us to support go-ipld-prime in go-ipfs more directly.
As mentioned, if implemented, this opens up a number of opportunities for various forms of caching in order to speed up responses. A list of CIDs in a DAG now becomes a much more useful item for a node to hold onto.

aschmahmann commented 3 years ago

@hannahhoward I think this is a great direction to explore! A couple thought/extensions here:

One factor we might use to "estimate" how far out we can safely go is agreement among multiple remote peers about what is contained in the CID list for a selector query

An alternative approach is to build up trust in a manifest over time

inspired by @petar in a related context, the idea is that given a manifest where we haven't yet verified all the parent nodes we increase the amount of unverified data we're willing to request at once
for example, we could increase the results geometrically, first allowing 10 unverified blocks at once, then if those all come back good then 20, 40, 80, etc. This would guarantee that at least half of the data we get at any given time is definitely good

This proposal also has a bit of an issue whereby an unverified manifest can force us to download nodes from an unrelated graph in a delegated DoS attack (i.e. malicious peers convince honest nodes to download bits from a target thereby wasting both party's bandwidths at minimal expense to the attackers). This can be mitigated by asking that peer for a manifest before asking it for the associated unverified nodes.

this means we won't ask a peer for a block unless we A) want it (e.g. normal bitswap operation) B) think we want it and they've confirmed they have it
we can maintain compatibility with non-upgraded nodes by simply not doing the above check if those peers do not support graphsync and/or a newer version of Bitswap. It makes them somewhat attackable, but it's not too bad and any affected node could upgrade.

MarcoPolo commented 2 years ago

Another extension that would solve the trust issue and let us parallelize safely:

What if we could prove that the manifest returned by the cid+selector for the graphsync/do-no-send-blocks was correct? I think this is possible (but a bit tricky) with a snark. Here's how I imagine it could work:

A client asks a provider for all the cids/links traversed for a cid+selector.
The provider returns a list of cids along with a small proof that they contain some block with the given cid and when traversed with the selector returns the given list of cids.
The client can then ask the network in parallel for all those cids since it knows for sure that they are related to the cid+selector. No trust required.
These cidlist+proofs can be cached for a given cid+selector, and we can even store the caches on a separate set of machines since we know they are correct.

In practice right now it's a little trickier I think (we may not be able to implement ipld traversal in a snark circuit for example). But I bet something simpler along the same idea could work today.

protocol / beyond-bitswap

RFC|BB|L2: Speed up Bitswap with GraphSync CID only query #25