ipfs / helia

An implementation of IPFS in JavaScript
https://helia.io
Other
916 stars 95 forks source link

feat: add explicit support for subdomain gateways #439

Open 2color opened 7 months ago

2color commented 7 months ago

Title

The main goal of this PR is to explicitly support subdomain gateways to avoid getting redirects like we currently do from https://4everland.io which no longer supports path gateways.

Since the TrustlessGatewayBlockBrokerInit type has changed this is a breaking change

Change checklist

2color commented 7 months ago

@lidel I'm not thrilled about making HTTP calls to the gateways in the constructor just to test for redirection because it introduces a side-effect. But I can see the value in it. Especially for user passed gateways.

The other problem is that you have to pay a runtime cost of making an HTTP call every time it's instantiated, rather than configuring it correctly once.

When you say "avoid hardcoding subdomain status", do you mean maintain the logic differentiating between subdomain and path gateways but autodetect in the code by making an HTTP request instead of passing it through?

achingbrain commented 7 months ago

Unless I'm missing something, looking at the responses from http delegated routers for get provs, this check might be something we need to do?

The peer schema for providers includes protocols like "transport-ipfs-gateway-http" but it doesn't tell you if it's a subdomain or a path gateway.

2color commented 7 months ago

Unless I'm missing something, looking at the responses from http delegated routers for get provs, this check might be something we need to do?

I think it's even more complex than that either way and I'm not sure if it's in the scope of this PR. For example, https://delegated-ipfs.dev/routing/v1/providers/bafybeicklkqcnlvtiscr2hzkubjwnwjinvskffn4xorqeduft3wq7vm5u4 returns two of these:

[     {
      "Addrs": [
        "/ip4/212.6.53.91/tcp/80/http"
      ],
      "ID": "12D3KooWHEzPJNmo4shWendFFrxDNttYf8DW4eLC7M2JzuXHC1hE",
      "Metadata": "oBIA",
      "Protocol": "transport-ipfs-gateway-http",
      "Schema": "unknown"
    }, 
    {
      "Addrs": [
        "/dns4/dag.w3s.link/tcp/443/https"
      ],
      "ID": "QmUA9D3H7HeCYsirB3KmPSvZh3dNXMZas6Lwgr4fv1HTTp",
      "Metadata": "oBIA",
      "Protocol": "transport-ipfs-gateway-http",
      "Schema": "unknown"
    }]

The first one isn't helpful because there's no TLS cert, but the second one isnt' really helpful either because it only supports cars:

curl -i -H "Accept: application/vnd.ipld.raw"  "https://dag.w3s.link/ipfs/bafybeicklkqcnlvtiscr2hzkubjwnwjinvskffn4xorqeduft3wq7vm5u4
HTTP/2 406
date: Wed, 21 Feb 2024 17:34:46 GMT
content-type: text/plain;charset=UTF-8
content-length: 14
server: cloudflare
cf-ray: 8590bdf19a9344fe-TXL

not acceptable⏎
curl -H "Accept: application/vnd.ipld.car" -i  "https://dag.w3s.link/ipfs/bafybeicklkqcnlvtiscr2hzkubjwnwjinvskffn4xorqeduft3wq7vm5u4"
HTTP/2 200
date: Wed, 21 Feb 2024 17:35:40 GMT
content-type: application/vnd.ipld.car; version=1; order=undefined; dups=y
cf-ray: 8590bf3fbef84528-TXL
accept-ranges: none
access-control-allow-origin: *
cache-control: public, max-age=29030400, immutable
content-disposition: attachment; filename="bafybeicklkqcnlvtiscr2hzkubjwnwjinvskffn4xorqeduft3wq7vm5u4.car"; filename*=UTF-8''bafybeicklkqcnlvtiscr2hzkubjwnwjinvskffn4xorqeduft3wq7vm5u4.car
etag: W/"bafybeicklkqcnlvtiscr2hzkubjwnwjinvskffn4xorqeduft3wq7vm5u4.car"
vary: Accept, Accept-Encoding
access-control-allow-methods: GET
access-control-expose-headers: Content-Length
x-content-type-options: nosniff
x-freeway-version: 2.15.0
server: cloudflare
alanshaw commented 7 months ago

Should be working now for raw blocks, sorry about that:

curl -H "Accept: application/vnd.ipld.raw" https://dag.w3s.link/ipfs/bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi --output block.bin
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  116k  100  116k    0     0   153k      0 --:--:-- --:--:-- --:--:--  154k

Free free to ping me for dag.w3s.link problems. We set that gateway up very quickly for Saturn and they've never requested raw blocks.

2color commented 7 months ago

Following the discussion in Helia-WG, it was brought up whether we want to detect whether a gateway is a subdomain or path at runtime (either leveraging the preflight request and this spec https://github.com/ipfs/specs/pull/425 or using heuristics. I don't believe we made a decision, right @lidel ?

achingbrain commented 7 months ago

We did not make a decision as such, but we did talk a bit about the constraints.

  1. Turns out the absence of transport-ipfs-gateway-http is not enough to use as a reason to assume that a given peer is not running a path/subdomain gateway
  2. Some CIDs cannot be used with a subdomain gateway (e.g. very long or in case-sensitive encoding)
  3. Gateways may redirect you to a subdomain if it's supported and the CID can be used this way

Given the above, I think there's still value in allowing the user to specify if a gateway is 100% absolutely for sure a subdomain gateway, but if not we should start by having them as a path gateway, examine the CID we are requesting, if it can be used in a subdomain and we receive a redirect to a subdomain URL we can flip that gateway into subdomain mode for future requests.

If the CID cannot be used in a subdomain we should treat it as a path gateway for this request*.

The preflight request should help here but AFAIK it's not available to browser app code. Can we query the cache for it and detect it that way?


* = What if we try convert it to a subdomain-compatible CID? E.g. v0 base58btc to v1 base36? Test for length, etc.

2color commented 7 months ago

Given the above, I think there's still value in allowing the user to specify if a gateway is 100% absolutely for sure a subdomain gateway, but if not we should start by having them as a path gateway, examine the CID we are requesting, if it can be used in a subdomain and we receive a redirect to a subdomain URL we can flip that gateway into subdomain mode for future requests

I suppose we'd do this check when a given GatewayBroker requests a block for the first time to avoid an unnecessary request and side effects when instantiating. Do you agree with the broad strokes of this approach?


Also dropping this link where we recently implemented the conversion to subdomain resolution.

achingbrain commented 7 months ago

Do you agree with the broad strokes of this approach?

Yes, sounds good.