ipfs / rainbow

A specialized IPFS HTTP gateway
https://docs.ipfs.tech/reference/http/gateway/
Other
91 stars 12 forks source link

Support direct HTTP retrieval from /https providers #125

Open lidel opened 6 months ago

lidel commented 6 months ago

This is GO version of https://github.com/ipfs-shipyard/service-worker-gateway/issues/72.

We want rainbow to benefit from /https providers (example) and use them in addition to bitswap

Ideally, we would be prioritizing HTTP retrieval over bitswap, where possible, as it lowers the cost of content providers, and incentivizes them to configure, expose, and announce HTTPS endpoints.

MVP scope

Focus should be on block (application/vnd.ipld.raw, ?format=raw) requests, as these will always work, across all implementations, and provide the best cachability for HTTP infrastructure we have.

CAR with IPIP-402 may be more involved, and may lead to duplicated block retrievals due to the way loading a page with a dozen of subresources works (all share the same parent, all fetched in parallel, may lead to racy case where parent blocks are fetched multiple times, slowing down page loads)

hacdias commented 6 months ago

Before continuing, I want to lay down some notes to make sure we're all on the same page about what needs to be done and about the current challenges with accepting the /https providers.

Most providers with HTTPS multiaddresses are unusable

Most, if not all, providers advertising /https multiaddresses are, standard-speaking, unusable. They do not follow the proper peer schema. We can certainly hammer the code to accept them, but I would rather have the original provider of the records implement the correct schema instead. So, instead of:

{
  "Addrs": ["/dns4/dag.w3s.link/tcp/443/https"],
  "ID": "QmUA9D3H7HeCYsirB3KmPSvZh3dNXMZas6Lwgr4fv1HTTp",
  "Metadata": "oBIA",
  "Protocol": "transport-ipfs-gateway-http",
  "Schema": "unknown"
},

We should be getting this:

{
  "Schema": "peer",
  "ID": "QmUA9D3H7HeCYsirB3KmPSvZh3dNXMZas6Lwgr4fv1HTTp",
  "Addrs": ["/dns4/dag.w3s.link/tcp/443/https"],
  "Protocols": ["transport-ipfs-gateway-http"]
}

As I said, the code can be hammered to accept this (albeit a bit harder in Go). But I would rather not go that avenue. We already have plans of completely removing support for "Schema": "bitswap" (e.g.: from Pinata) from Boxo. Supporting one more non-standardized schema will just make things more complicated when it doesn't need to be.

Fetching the block via HTTPS

The current flow to fetch a block, from the Blockservice perspective, is as follows:

  1. Blockservice gets asked for a block
  2. Blockservice checks with Blockstore, if it has it, return it. Otherwise,
  3. Blockservice asks the Exchange, which currently is just Bitswap
  4. Bitswap looks out for providers using a routing.ContentRouter. This routing.ContentRouter only has Bitswap-related peers. All other peers are ignored, even if they come from a /routing/v1 endpoint.
  5. Bitswap tries fetching it, returns, etc, etc.

I see a few ways of potentially solving this.

(a) Parallel Exchanges

Create a parallel exchange that calls both Bitswap and a new exchange that can take advantage of the Delegated Routing endpoint results that have non-Bitswap peers.

Challenges I see:

  1. Duplicate HTTP requests to delegated routing endpoints, done by both exchanges.

(b) Smarter Exchange

An exchange where you can register sub-exchanges (or fetchers) for certain protocol types. This exchange would call FindProviders itself, and depending on the results, would parellelize calls to different fetchers (Bitswap, Gateways, etc).

Challenges I see:

  1. We need to already be able to tell the Bitswap client that we know that peer X has the block Y to avoid it doing the FindPeers request again. Maybe it's already possible, but I'm not familiar enough with the code. Needs investigation.
  2. Reconcile Delegated Routing lookups with DHT lookups. Boxo only provides code for the opposite case: delegated routing to Libp2p routers, ignoring every non-bitswap code. . This is already done in someguy, which parallelizes DHT and Delegated Routing endpoints into a Delegated Routing-like interface. We'll likely want to re-use the code.

(b) seems technically more complicated (at least without looking at what is currently possible), but likely better to save duplicated HTTP requests and resources. We can also probably reuse the new RemoteBlockstore from boxo/gateway to fetch remote blocks from the /https peers.

lidel commented 6 months ago

Triage:

hacdias commented 6 months ago

Update:

lidel commented 6 months ago

Something we could try, without changing too much, without touching higher level abstractions like exchanges, is doing opportunistic HTTP fetch in boxo/bitswap itself.

Wrote initial thoughts in https://github.com/ipfs/boxo/issues/608 – pinged some folks, looking for feasibility feedback.