ipfs / boxo

A set of reference libraries for building IPFS applications and implementations in Go.
https://github.com/ipfs/boxo#readme
Other
199 stars 86 forks source link

[ipfs/go-bitswap] Integration between Graphsync and IPFS #85

Open petar opened 3 years ago

petar commented 3 years ago

The goal is to enable bitswap to support different methods of fetching a block, so that it can access non-bitswap sources like filecoin nodes which may use graphsync (via https://github.com/filecoin-project/go-data-transfer) and eventually other payment-based methods.

Fundamentally, Bitswap brokers information about which peers have a cid. This is captured in the form (cid, peer_id). It is implied that the method of fetching is the bitswap transfer protocol.

To generalize Bitswap, we need to change the information that is associated with a cid. For each cid, we would like to keep track of multiple "routing expressions" each of which describes a different method to fetch the block.

Routing expressions are expressions in the routing language syntax, which represent valid descriptions of methods to fetch a block, according to the existing Routing Language Spec.

For instance,

     fetch(
          cid=link("Qm15"),
          proto=bitswap,
          providers=[multiaddr("/ip4/8.1.1.9:44")],
     )

or

     fetch(
          cid=link("Qm15"),
          proto=graphsync,
          graphsync_voucher=0x12ef78cd,
          providers=[multiaddr("/ip4/8.1.1.9:44")],
     )

In essence, the routing information brokered should be of the form (cid, list of routing expressions).

This entails changes to every part of bitswap that touches routing information (for cids):

Remarks This is an absolute minimum plan to enable the integration. Going forward, a lot of additions can be made to improve the scale and speed of the routing process in bitswap. E.g. the "have" messages can be generalized to communicate multiple sources for a block, so that peers can share with each other knowledge about where else the block can be downloaded. E.g. "I have the block, but I also know that this filecoin miner has the block you want too, and they also have the entire directory where the block lives."

Related IPFS / Filecoin interop plan: https://hackmd.io/JoZiAAtnTpqAKuQaEUra4g

PRs comprising the resolution of this issue Step 1: https://github.com/ipfs/go-bitswap/pull/512

Follow-up tasks After this issue is resolved, the following (smaller) issues must be addressed before IPFS is fully ready to talk to the Golden Path product: https://github.com/ipfs/go-bitswap/issues/509, https://github.com/ipfs/go-bitswap/issues/510

welcome[bot] commented 3 years ago

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review. In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment. Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

Finally, remember to use https://discuss.ipfs.io if you just need general support.

Stebalien commented 3 years ago

This statement doesn't make a lot of sense. I assume you're referring to some form of meta exchange that can use both the bitswap protocol and graphsync?

To generalize Bitswap, we need to change the information that is associated with a cid. For each cid, we would like to keep track of multiple "routing expressions" each of which describes a different method to fetch the block.

This could use a lot of motivation. I'd expect the flow to be:

  1. I find out who has what. This is a mapping of CID -> PID.
  2. I connect to peers, then request content via whatever protocol they support.

Of course, I might want additional information before I bother to make a connection. For example:

But then I'd expect the record to look more like:

{
    provider: PeerID,
    content: cid,
    protocols: {
        "/ipfs/bitswap/1.1.0": {...},
        "/ipfs/graphsync/1.0.0": {"token": ..., "price": ....}, // needs to specify across payment systems.
    }
}

Eventually, "queries" could be extended to select things like "supports graphsync but charges less than X".

petar commented 3 years ago

@Stebalien:

I find out who has what. This is a mapping of CID -> PID.

This is how things work today. We'd like to generalize this significantly. A source for a CID's content need not be a peer at all. For instance, it could be a legacy FTP service at a given IP, or a Bittorrent link (which doesn't even refer to a specific host). A routing expression can describe any such method.

{
    provider: PeerID,
    content: cid,
    protocols: {
        "/ipfs/bitswap/1.1.0": {...},
        "/ipfs/graphsync/1.0.0": {"token": ..., "price": ....}, // needs to specify across payment systems.
    }
}

Since discoverable sources for a CID may be heterogenous (e.g. a peer using bitswap, filecoin miner using graphsync, github repo at a given commit, etc), each CID is associated with a list of routing expressions, each of which describes some individual source. This is in contrast to having a single CID record (as the one above) that tries to describe all sources.

Stebalien commented 3 years ago

Ah, I see. Yeah, that makes a lot of sense. So we'd have an engine on-top-of-bitswap handling the generalized content routing records, passing information into each protocol.

Since discoverable sources for a CID may be heterogenous (e.g. a peer using bitswap, filecoin miner using graphsync, github repo at a given commit, etc), each CID is associated with a list of routing expressions, each of which describes some individual source. This is in contrast to having a single CID record (as the one above) that tries to describe all sources.

Makes sense.

hannahhoward commented 3 years ago

My recommendation is to do https://github.com/ipfs/go-bitswap/pull/512 to abstract the content routing source, add the ability to talk to indexers once they exist, and stop till we understand the direction we're heading.

As I see it, there are two paths to Golden Path in IPFS:

So my take is: do the part that is needed for either approach and stop. There's no progress to be made for real until miner indexes actually exist anyway. Especially if we do the great Web3 future data transfer stack refactor, we need a wide set of folks working on it. If we want to do something further, I would allocate a team of folks with deep experience in our data transfer protocols and content routing to do planning for how to actually refactor our libraries top to bottom to deliver on the needs for mixing filecoin and IPFS. This would at least help us determine how much work we're actually talking about, and when we could realistically deliver it.

Martingoodnews commented 9 months ago

Definitely one-of-a-kind, I found this blog to be extremely helpful. Continue your fantastic work. Come and be part of the Filecoin Orbit Mixer a vibrant event where blockchain enthusiasts, developers, and industry leaders come together to explore endless possibilities. My blog contains additional information. Visit to find out more about it. Don't pass up this opportunity.