ipfs / go-graphsync

Initial Implementation Of GraphSync Wire Protocol
Other
100 stars 38 forks source link

Graphsync Simplication Idea: one libp2p stream per request, no message queues (except for retries) #259

Open hannahhoward opened 2 years ago

hannahhoward commented 2 years ago

What

The message protocol format of go-graphsync combines multiple-requests and responses into a single message on a single stream.

Perhaps it is time to reexamine the protocol and the rationale for combining request and responses, and having blocks live as a seperate element from responses.

Why was it this way

  1. Graphsync was modelled after Bitswap, which combines wantlists, responses, and blocks in the same message
  2. We assumed block deduplication was a chief concern. The thought was, if two requests are executing, and they both encounter the same block, they shouldn't send it twice. They shouldn't send it twice in the same message, and, more nebulously, they shouldn't set it twice if they are executing "at the same time"

Design rationale in practice

In filecoin, data is almost always streamed to different blockstores (CAR files). This means that ultimately, we almost never need to deduplicate across requests. In fact, we have a whole extension to graphsync just to prevent this.

Consequences

Package multiple requests and responses into a single message stream is the source of much go routine complexity — as you execute a response in one thread, you have to send the data over to a per peer message queue thread that is packaging up responses to go over the network. Meanwhile, on the other side, matching up responses and blocks with their associated requests is more complex than it needs to be.

Moreover, cause message sending is one queue per peer, our allocator backpressure system is per peer as well -- when it might make more sense for it to be per request.

The need to handle multiple requests and responses and match them up to to blocks has been an ongoing source of difficulty for/complaints from implementors of graphsync in other languages.

Why leave it as it is

If remains to be seen if multiple request/response per message and associated deduplication is more important for the IPFS use case, where data tends to land in a single blockstore.

Stebalien commented 2 years ago

Honestly, I just assumed this was how it worked. That is, I assumed you'd:

  1. Open a new stream.
  2. Send a (one) graphsync request.
  3. Get a stream of blocks back on that same stream.
Stebalien commented 2 years ago

Why leave it as it is

Backpressure?