ipfs / helia

An implementation of IPFS in JavaScript
https://helia.io
Other
878 stars 91 forks source link

[🏆 Golden path scenario] Browser-authored content retrievable by another machine through the ipfs.io gateway directly #182

Open BigLep opened 1 year ago

BigLep commented 1 year ago

Done Criteria

A user can author content in their browser via Helia and have it retrievable by another machine through the ipfs.io gateway without relying on pinning services or preload nodes.

Why Important

This is a common usecase that users hit. Failure here feeds the narrative that "IPFS doesn't just work".

Notes

  1. This builds on https://github.com/ipfs/helia/issues/256, which used pinning services. We’re assuming it was completed first before taking this on.
  2. Even though this usecase request has come in (e.g., HackFS 2023), it’s a lower priority because it has the fundamental flaw of relying on a browser tab to stay open/active. This may make sense for a demo, but has serious usability flaws. In practice, we expect most browser apps will want to be more resilient, and for that, they need a way to get data off the browser (e.g., use a pinning service which https://github.com/ipfs/helia/issues/256 satisfies).
  3. "retrievability from the ipfs.io gateway" is used as a popular "stand in" for other nodes on the network.
  4. We need to enable discoverability of the content created in the browser so that the ipfs.io gateway can discover it. This requires one or more of:
  5. DCUtR in js-libp2p is needed so the Kubo gateway can run the protocol with the browser node via the relay and instruct the browser to dial one of its public multiaddrs supported from the browser (e.g., WSS, WebTransport). This will have been handled in https://github.com/ipfs/helia/issues/256.
  6. The next step here is to allow a private Kubo node (e.g., Kubo running in one's Brave browser) to fetch the content authored in a browser on a separate host. We ultimately need Kubo to support WebRTC since WebRTC is required for browser/private-node connectivity per here. (Kubo tracking issue: https://github.com/ipfs/kubo/issues/9724 ). This should have been completed though by https://github.com/ipfs/helia/issues/256.
  7. Per above, this isn't a pure Helia issue. Tracking the usecase needs to go somewhere though, so I'm putting it Helia for now so we can link against it.
### Tasks
- [ ] IPIP/spec for HTTP /routing/v1 PUT support (at least writing delegated content records to the DHT): <https://github.com/ipfs/specs/pull/378>
- [ ] Boxo having /routing/v1 PUT support (IPIP-378)
- [ ] Exposing /routing/v1 PUT support in Kubo
- [ ] routing.delegate.ipfs.tbd having /routing/v1 PUT support
- [ ] JS updates to use IPIP-378 (write side of delegated content routing)
- [ ] js-libp2p: Reconnect with relays so that previously published multiaddrs with relays have a higher chance of still working - https://github.com/libp2p/js-libp2p/issues/1955
aschmahmann commented 1 year ago

This is more of a libp2p issue, and more of a go-libp2p issue

While support for WebRTC will certainly help in some scenarios (i.e. the browser does not support WebTransport and the node fetching the data doesn't have a WSS address with a CA cert) IIUC the main difficulty in getting data from browser helia nodes discoverable by gateways, etc. is data not being advertised to the DHT, IPNI, etc.

Am I mistaken and it turns out advertising small amounts of data to the DHT from a helia browser node is working well enough at the moment (at least for browsers that support WebTransport)?

SgtPooki commented 1 year ago

@aschmahmann even in a browser that supports WebTransport, i've been having difficulty getting any successful webtransport connections. I just pushed up a repo where I was playing around: https://github.com/SgtPooki/helia-playground -- it was essentially copied from https://codesandbox.io/p/sandbox/helia-script-tag-forked-3q8y35 to a local workspace so i could modify things more easily.

One thing I started seeing was that activeStreams.length never breaches 0 for me, no matter how many peers or how many connections I have. I suspect a bug in libp2p/webtransport but I haven't been able to fully track it down.

I want to create a simple test where a browser helia node can successfully talk to a backend helia node, but that will have to wait for a bit.


ninja-edit:

Also, there seems to be a non-stop spamming of webtransport dial attempts.. and i'm not sure how best to control that with libp2p-connection-manager.

BigLep commented 1 year ago

@aschmahmann : good callouts - thanks.


Let's assume:

  1. the ipfs.io Kubo node has maximum connectivity possible today with WebTransport and WSS address with a CA cert
  2. the browser authoring the data supports WebTransport and WebSockets
  3. the ipfs.io Kubo node discovered the multiaddr of the browser that authored the data

How does this ipfs.io Kubo node retrieve the data from the browser node? My understanding is that it still can't initiate a connection to the browser in this scenario and this scenario would only work if there was a preexisting connection between the browser node and the ipfs.io Kubo node.


Also, I expanded the "Notes" section in the top description to further expand on the underlying issues:

  1. Underlying issue 1: discoverability of the content created in the browser so that the ipfs.io gateway can discover it. This requires one or more of:
  2. Underlying issue 2: libp2p connectivity, especially go-libp2p connectivity, since we ultimately need Kubo to support WebRTC since WebRTC is required for server nodes to dial browsers. (Kubo tracking issue: https://github.com/ipfs/kubo/issues/9724 ).

Please go ahead and fix/correct any mistakes here.


Thank you!

aschmahmann commented 1 year ago

How does this ipfs.io Kubo node retrieve the data from the browser node? My understanding is that it still can't initiate a connection to the browser in this scenario and this scenario would only work if there was a preexisting connection between the browser node and the ipfs.io Kubo node.

Yeah, that's right good callout. I had assumed there was some level of support for DCuTR in js-libp2p that came along the relay-v2 support. With the simplest DCuTR support (dialbacks) what would happen is that the helia node would connect to a (limited) relay-v2 node that speaks some protocol the helia node can speak (e.g. WSS, WebTransport, etc.) and they would then have as their address /the/multiaddr/of/the/relay/circuit-relay/p2p/helia-node-peerID which when a publicly reachable node (e.g. the ipfs.io kubo nodes) wanted to contact the helia it would ask the relay to have the helia node dial it back (using WSS, WebTransport, etc.).

This doesn't require any holepunching kinds of magic, just a simple relay + the dialback portion of the DCUtR protocol.

Seems like it might be worth scoping this as a smaller and more important set of work in https://github.com/libp2p/js-libp2p/issues/1460.

SgtPooki commented 1 year ago

Notes from Helia WG 2023-07-27

achingbrain commented 1 year ago

DCUtR for js-libp2p is in progress here: https://github.com/libp2p/js-libp2p/pull/1928

SgtPooki commented 1 year ago

Note that the libp2p hole-punching vision table also illustrates the problem here fairly well: https://github.com/libp2p/specs/blob/d2106f43e878ae4c3a1c6465a7c329835290fe22/connections/hole-punching.md#vision

BigLep commented 1 year ago

It's great that progress is happening here.

Folks have correctly pointed out that for the stated usecase of Kubo ipfs.io gateway retrieving content from the browser that go-libp2p WebRTC isn't needed. We only need js-libp2p DCUTR. That's great, and I agree that should be the first usecase.

That said, I don't want to let up there since the ultimate is "universal connectivity". The next step here is to allow a private Kubo node (e.g., Kubo running in one's Brave browser) to fetch the content authored in the browser. For this we ultimately need Kubo to support WebRTC since WebRTC is required for browser/private-node connectivity per here. This can come after, but I have updated the issue notes to be accurate and to discuss this followup step.

achingbrain commented 1 year ago

I've been doing a bit of investigation, what I've found is:

  1. Browser connections are unstable
    • This causes remotes to drop connections, including relay connections
  2. This can cause relay addresses to change as new relays are found
  3. Publishing DHT provider records does not always succeed
    • This is because the ADD_PROVIDER query frequently traverses through nodes it can't dial
    • This will improve as more of the network supports webtransport and webrtc
  4. Even when ADD_PROVIDER succeeds, Kubo nodes (my local one at least) can't always resolve the record
  5. Kubo nodes (my local one at least) can't always look up browser nodes in the DHT
    • This could be because private Kubo DHT nodes can't dial browsers yet to ping them so they are evicting them from their routing tables?
    • Also if the relay addresses for the browser peer changes Kubo DHT nodes won't be able to re-dial them?

Browser CPU usage is very high, this may contribute to 1. 2. is quite concerning because if the relay address changes the published provider records then have out of date multiaddrs.

Right now I think in the circuit relay code if a relay connection is lost we assume the relay is bad and we start to search for new relays, but we may need to assume that we are bad and make some sort of attempt to reconnect, if that fails then start searching for others.

Until adoption of webtransport improves, we may need some sort of web service that can publish provider records on behalf of the browser? But ones where the browser is the provider, not the web service so is slightly different to the delegated content routing strategy we used to use.

Also found a few other weird bits and pieces

BigLep commented 1 year ago

@achingbrain

I've been doing a bit of investigation,

Thanks - good write up!

(For others to be aware) per 2023-08-10 Helia Working Group, I don't think it's not worth the investment right now to focus on writing provider records directly to the public IPFS DHT from the browser. We'll instead rely on solving the write-side of "Underlying issue 1: discoverability of the content created in the browser so that the ipfs.io gateway can discover it" through to-be-created/updated delegated routing endpoint. Kubo/Boxo maintainers are aware of the priority of this work and are taking it on now as they finish up the read side of HTTP /routing/v1.

Also, it sounds like you have a test setup (awesome). I assume we're going to need this throughout the golden path development. If there is anything to document or check in to help others in testing or verifying their work, please share.

I have updated the task list in the issue description with everything I'm aware of that needs to be done along the different tracks:

  1. Tasks for underlying issue 1: discoverability of the content created in the browser
  2. Tasks for underlying issue 2 / libp2p connectivity part 1: Kubo gateway can instruct the browser to dial one of its public multiaddrs supported from the browser
  3. Tasks for underlying issue 3 / libp2p connectivity part 2: Kubo supports for browser/private-node connectivity

Thanks also for the fixes along the way - good stuff!

BigLep commented 12 months ago

I has morphed this golden path issue to be scoped to retrievability of browser authored content without relying on pinning services (i.e., as long as one's browser tab is open).

For retrievability of browser-authored content, we're going to focus first on relying on pinning services: https://github.com/ipfs/helia/issues/256

That said, the top priority is reliable browser retrieval of any content. This is happening in https://github.com/ipfs/helia/issues/255 . This is the top "golden path scenario" focus.

achingbrain commented 11 months ago

Browser connections are unstable This causes remotes to drop connections, including relay connections

In recent releases this is much improved:

image