ipfs / go-graphsync

Initial Implementation Of GraphSync Wire Protocol
Other
100 stars 38 forks source link

Add `graphsync/do-not-send-first-blocks` extension #225

Closed hannahhoward closed 2 years ago

hannahhoward commented 2 years ago

What

Since selector traversals are deterministic, the simplest way to resume a previous request is to simply tell the responder to not send the first N blocks, corresponding to the blocks you have had already received at the time the request was interrupted. Add an extension to go-graphsync that allows the requestor to instruct the responder when responding to a graphsync request to not send full block data for the first N blocks

Proposed Implementation

When sending a request, the requestor sends an extension graphsync/do-not-not-send-first-blocks and encodes the number of blocks to skip as a CBOR encoded int in the data field

The IPLD schema is as follows:

type DoNotSendFirstBlocks Int

This issue covers implementing built-in support for such this extension in the go-graphsync Responder.

Acceptance Criteria

As a client I can call:

var gs graphsync.Exchange
var ctx context.Context
var p peer.ID
var root ipld.Link
var selector ipld.Node
var blocksToSkip int64

gs.Request(ctx, p peer.ID, root, selector, graphsync.DoNotSendFirstBlocks(blocksToSkip))

The requestor will properly encode the graphsync/do-no-send-first-blocks extension and the responder will not send the number of blocks specified at the beginning of the selector. Assuming the client has the first N blocks stored locally, the request will finish as normal.

Proposed Improvement

A possible improvement would be to have the go-graphsync requestor implementation itself attempt to load blocks locally until it no longer can, and then it would encode this extension automatically to avoid receiving those blocks.

This would obviate the need for higher level libraries to track how many blocks are sent (i.e. go-data-transfer)

whyrusleeping commented 2 years ago

This sounds good to me, though im not sure the proposed improvement is actually an improvement. Disk IOPS on giant files get expensive.

dirkmc commented 2 years ago

Sounds good to me 👍

dirkmc commented 2 years ago

Should we just call it "block-offset"?

dirkmc commented 2 years ago

tell the responder to not send the first N blocks

In this case is N the number of unique blocks? It's easier for higher layers to consistently keep track of the number of unique blocks.

hannahhoward commented 2 years ago

In this case is N the number of unique blocks? It's easier for higher layers to consistently keep track of the number of unique blocks.

Is it though? We track giant lists of CIDs in go-data-transfer, but what if we just incremented a counter in DataReceived?

hannahhoward commented 2 years ago

@whyrusleeping

this is a proposed improvement just makes things easier for the higher level library. I agree, it's not super efficient.