Open hannahhoward opened 3 years ago
@hannahhoward I think this is a great direction to explore! A couple thought/extensions here:
One factor we might use to "estimate" how far out we can safely go is agreement among multiple remote peers about what is contained in the CID list for a selector query
An alternative approach is to build up trust in a manifest over time
This proposal also has a bit of an issue whereby an unverified manifest can force us to download nodes from an unrelated graph in a delegated DoS attack (i.e. malicious peers convince honest nodes to download bits from a target thereby wasting both party's bandwidths at minimal expense to the attackers). This can be mitigated by asking that peer for a manifest before asking it for the associated unverified nodes.
Another extension that would solve the trust issue and let us parallelize safely:
What if we could prove that the manifest returned by the cid+selector for the graphsync/do-no-send-blocks
was correct? I think this is possible (but a bit tricky) with a snark. Here's how I imagine it could work:
In practice right now it's a little trickier I think (we may not be able to implement ipld traversal in a snark circuit for example). But I bet something simpler along the same idea could work today.
Abstract
This RFC proposes to add a protocol extension to GraphSync that would cause a GraphSync responder to send only metadata in a GraphSync response -- but no blocks. We would use the list of CIDs contained in the metadata to fetch all the blocks in a DAG with Bitswap in nearly in paralel.
Shortcomings
When fetching a DAG with Bitswap, we must first request and receive blocks at level N of the DAG in order to request blocks at level N+1. This slows a DAG traversal significantly because we're can't parallelize requests for the entire DAG.
GraphSync is attempts to solve this problem by requesting a whole DAG from another peer, expressed through a selector. However, requesting at the DAG level makes spreading requests across peers more difficult -- splitting a DAG at a selector level is significantly more complex than simply splitting up block block requests
Description
The idea here is to essentially use GraphSync to assemble a lightweight manifest for arbitrary DAG data. We use GraphSync to get a list of CIDs that are in the DAG we want, and then we use Bitswap to get the CIDs. In theory, this might offer a "best of both worlds" type solution to fetching large DAGs that multiple peers have possession of. Moreover, it requires minimal protocol changes -- and implementing the protocol extension in go-graphsync on the responder side is very trivial. Moreover, storing a list of CIDs in a DAG is sufficiently cheap that GraphSync peers might begin caching frequently requested CID lists. The system might also pair nicely with RFCBBL1201 - we could make a "WANT" request for a DAG to determine who's got it, then a CID only query with GraphSync, then fetch CIDs with Bitswap
Implementation
graphsync/do-not-send-blocks
extensiongraphsync/do-no-send-blocks
extension will perform a selector query, and include metadata in responses but omit all blocksSimilarities and Differences With Request Default GraphSync Requestor Implementation
The approach here is shares several similarities with the normal operation of a GraphSync Requestor:
The difference is:
Challenges
We often want to traverse a whole DAG all the way to its leaves, but remote peers will limit the depth of recursion for GraphSync queries, meaning we may need to make multiple GraphSync queries to get the whole dag. One challenge however is we don't currently know in GraphSync whether CIDs are truly leaves or the point at which selector traversal hit maximum recursion depth. And in this scenario we won't know until we fetch them. This may be an additional extension we need to look at adding to the GraphSync protocol
We don't want to fetch large sets of blocks that turn out to be wrong. At the time, there is an inherent tension between wanting to fetch ahead enough not to be limited by Bitswap's round trip bottleneck, and not wanting to fetch too many blocks that turn out to be wrong. One factor we might use to "estimate" how far out we can safely go is agreement among multiple remote peers about what is contained in the CID list for a selector query
Evaluation Plan
Impact