libp2p / notes

libp2p Collaborative Notebook for Research
MIT License
37 stars 4 forks source link

Multiaddr Based Content Routing #11

Open aschmahmann opened 5 years ago

aschmahmann commented 5 years ago

Context

A standard IPFS data request causes the Exchange (i.e. Bitswap) to search the Content Routing system for a set of PeerInfo objects (which are just PeerIDs + their multiaddrs). The Exchange then takes these PeerInfo objects and requests data from the peers.

This causes us to need libp2p peers to proxy all data available to the Content Routing system, but if the data is available elsewhere shouldn't we be able to access it?

Proposal

I would like to be able to request data from multiaddrs that do not correspond to libp2p peers. For example, if we want to store data with some cloud storage provider like AWS S3 we could put a provide record in the DHT that Hash(Data) lives at /http/mybucket.s3.amazonaws.com/Data.

Motivation

While we could also run an a compute node, like EC2, with a set of IPFS cluster daemons on them with an S3-backed datastore it's certainly more costly. This is even more interesting if we can "draft" data that's publicly available over HTTP into IPFS.

Implications for future work

While the first iteration of this idea is conceptually fairly simple, it has implications for some of our ongoing endeavors. For instance, if we have 1000 small blocks hosted on /http/mysite.com/Data1-1000 that are all part of a single IPLD object we wouldn't be able to just provide the root IPLD node since there's no peer that will be able to tell us where the other 999 blocks are. There are various ways we could extend the protocol to allow us to tell retrievers where the other 999 blocks are, but it's not as simple as with the existing peer based retrieval.

Additionally, we would likely face increasing demand to support large files that are available over HTTP. Since we don't want users to download a lot of data before it's verified we'd probably want to extend the protocol with some ability for the peers advertising the content in the DHT to add (references to) hashes of chunks of the large file that could be verified. Similarly, we'd want to add the ability to download ranges of bytes when presented with a multiaddr that supports that functionality.


I think implementing this functionality could make running "pinning" services much easier and less expensive as well as greatly increasing the amount of content accessible via IPFS. But what about you @stebalien @raulk @bigs ?

mikeal commented 5 years ago

This aligns well with a project I'm working on in IPLD for centralized Block storage over HTTP.

Also, this is similar to a prior discussion I started for adding an equivalent to Bittorrent's webseed feature https://github.com/ipld/ipld/issues/57

You'll need to settle on a base encoding for the CIDs, I'm planning on base32 to align with the rest of our shift to base32.