Feature request: "web seeding" equivalent functionality?

James-E-A commented 2 years ago

Is your feature request related to a problem? Please describe.

The "problem" this solves is that of wanting to hold one or most web hosts as a back up in the case of no-one seeding the file in IPFS (or all currently-seeding peers being inaccessible).

This is a use-case that, for instance, most BitTorrent clients have supported since forever — with the as parameter or httpseeds field — this functionality is colloquially known as "web seeds".

Describe the solution you'd like

Two possible solutions; either/or would be great, but the first one seems way cleaner:

The IPFS Companion add-on should attempt to treat the named HTTP servers as "web seed"-like peers when attempting to fetch the following URL: https://ipfs.io/ipfs/QmVSSCvbYX8XHVcf2kqrpGGmH5PdCbgAP11CCJXrrJQ2yJ?filename=ssl-mitm.pdf&x-ipfs-webseed=https%3A//s3.amazonaws.com/files.cloudprivacy.net/ssl-mitm.pdf&x-ipfs-webseed=https%3A//crysp.uwaterloo.ca/courses/cs458/W11-lectures/local/files.cloudprivacy.net/ssl-mitm.pdf&x-ipfs-webseed=https%3A//static.banky.club/shitposter.club/af4ff1a87cec1d73451b030c9c6efbf3f04bc44f0e5e14ebc9b227eb8e03c97e.pdf (cf. the as parameter in Magnet URIs)
The IPFS Companion add-on should attempt to fetch the following file from IPFS. (If doing so always would be inappropriate, then at least it should step in when there's a problem with the download—such as a network error, HTTP error, or a domain that's been hijacked according to Safe Browsing): https://s3.amazonaws.com/files.cloudprivacy.net/ssl-mitm.pdf#:~:x-ipfs-path=/ipfs/QmVSSCvbYX8XHVcf2kqrpGGmH5PdCbgAP11CCJXrrJQ2yJ (cf. https://lists.w3.org/Archives/Public/www-talk/2001NovDec/0090.html)

Describe alternatives you've considered

Requesting the CDN in question add in x-ipfs-path headers to static files
Hosting it somewhere that includes the appropriate /ipfs slug in the path
Actually seeding the file myself by pinning it on an IPFS node

James-E-A commented 2 years ago

Note that this need not be limited to single Unix-files; even ye olde BitTorrent clients support webseeds for content-addressed directories:

The file/folder structure needs to be identical to the torrent.

James-E-A commented 2 years ago

(3) here's another suggestion in the same vein:

this <a href="https://surface.syr.edu/cgi/viewcontent.cgi?article=1846&context=honors_capstone"
   integrity="sha256-nLGeuXDdJZgaa6wpAOSy/81NZ4ngkNjub0ii9tey25c=
              ipfs-mAVUSIJyxnrlw3SWYGmusKQDksv/NTWeJ4JDY7m9IovbXstuX">fantastic explainer by Ashley Valentijn of Syracuse University</a>

re ipfs://bafkreie4wgpls4g5ewmbu25mfeaojmx7zvgwpcpasdmo432iul3npmw3s4

lidel commented 2 years ago

Thank you for brainstorming! Some thoughts / questions (if i missed the mark, please elaborate more):

(1) This flow is tricky, because:
- data could be imported to IPFS with custom parameters (different chunks, different hash function, different DAG type)
- CID of data imported with default parameters would not match.
- Including parameter space in URL is not an option, as one could create DAG by hand, outside of parameters in go-ipfs.
- The only way to have all the data to deterministically verify a HTTP response is if the web seed is a CAR file. This is still a good idea :+1: but not sure how useful given someone needs to create a CAR via ipfs dag export
- We are looking into "light client" which will fetch CARs and Blocks over HTTP, but it feels like unrelated feature
(2) Is the idea here to fetch from IPFS rather than original URL if user agent support IPFS?
- If so, why not link to https://cf-ipfs.com/ipfs/QmVSSCvbYX8XHVcf2kqrpGGmH5PdCbgAP11CCJXrrJQ2yJ?filename=ssl-mitm.pdf, which will work over HTTP and if ipfs-companion is present get loaded from local node?
(3) iiuc not feasible for various reasons
- IPFS CID is not a hash of a file, but a hash of a root node of a DAG
- There is no web extension API that would allow us to do this in a performant way (requires script injection on every page)
- If you want content integrity verification, you use /ipfs/{cid} paths or DNSLink, and ipfs-companion redirects them to local node, which does the verification natively.
ps. ipfs://ipfs/{cid} is invalid (https://github.com/ipfs/go-ipfs/pull/7930), use ipfs://{cid}

James-E-A commented 2 years ago

[/ipfs/QmV…2yJ?filename=…&x-ipfs-webseed=…] is tricky, because…data could be imported to IPFS with custom parameters (different chunks, different hash function, different DAG type); CID of data imported with default parameters would not match (including parameter space in URL is not an option, as one could create DAG by hand, outside of parameters in go-ipfs).

Torrents also have different chunking parameters that can lead to different URNs for the same exact file (or even equivalent fs-trees), and yet solved web-seeding / leeching from "dumb" HTTP servers without provisioning for said parameters in the URI scheme (and without any extra ado such as CAR files: they built the feature while imposing nothing more than a single parameter, as=${url}, on the existing interface).

While (as with BitTorrent) active/"real"/software peers will be required to bootstrap the structure of the file (i.e. fetching non-leaf data), once the software has constructed a mapping of the chunkspace onto an fs-tree, it's then clear to pull at least the leaf nodes from HTTP servers. (Admittedly, a ~0.5MiB PDF wasn't a great prototypical example of this, since most people will be able to snap the whole file up in an instant in the same order of time it takes to fetch the metadata.)

[https://cf-ipfs.com/ipfs/QmVSSCvbYX8XHVcf2kqrpGGmH5PdCbgAP11CCJXrrJQ2yJ?filename=ssl-mitm.pdf] will work over HTTP and if ipfs-companion is present get loaded from local node

Per the OP, that (currently) breaks whenever no IPFS client happens to be seeding (pinning) the file at the moment, even if the file's canonical location is still OK. A client supporting x-ipfs-webseed parameters would at least have the possibility not to be left in the dark should the seeding peer drop out mid-transfer.

[<a href="…" integrity="ipfs-mAV…tuX"> is] not feasible [because] …There is no web extension API that would allow us to do this in a performant way (requires script injection on every page)

I proposed that in light of the existing linkification feature: given that linkification is seemingly deemed acceptable, I deduce the particular rubicon of touching pages isn't a blocker—checking links for a specific attribute should be far more performant than scrubbing all text nodes for excerpts that look like IPFS URIs anyway.

lidel commented 2 years ago

linkification is an opt-in experiment which degrades browser performance on inexpensive hardware (what is why it is disabled by default).
I am open to hearing details of how we can do web seeding in IPFS – this would be really valuable, as long we get the same CID (whatever we come up with, the content integrity must hold).
- It sounds like you'd have to either store this additional metadata in the URL somehow, or fetch some blocks from IPFS anyway (just like bitorrent does with non-leaf data, but in IPFS it gets tricky because unixfs can have data in all nodes, not only leaf nodes)
- If anyone is interested in working on this, writing a protocol spec describing all the steps would be a good idea to kick this off. It could be a PR that adds docs/webseed.md to this repo for now, we can continue from there.

ipfs / ipfs-companion

Feature request: "web seeding" equivalent functionality? #1049