ipfs / helia

An implementation of IPFS in JavaScript
https://helia.io
Other
916 stars 99 forks source link

Create infrastructure for testing and validating retrieval reliability with Helia #275

Open SgtPooki opened 1 year ago

SgtPooki commented 1 year ago

discussed in https://pl-strflt.notion.site/Reliable-retrieval-monitoring-project-plan-formation-ce2190c2ad054a44b8d0fca7d2cf6006?pvs=4

### Tasks
- [x] Phase 0: validate with probelab that they can pull from GitHubContainerRegistry
- [x] Phase 1: create helia nodejs gateway that can work with probelab's tiros
- [x] Phase 1: implement /api/v0/repo/gc to run garbage collection upon request
- [x] Phase 1: respond to `<host>/<namespace>/request` using a simple server
- [x] Phase 1: Tiros support for a Helia gateway: https://github.com/plprobelab/tiros/pull/5
- [x] Phase 1: problab.io testing website using a Helia gateway: https://github.com/plprobelab/probelab-infra/pull/75
- [x] Phase 1: Investigate helia-http-gateway crashing issue. - https://github.com/ipfs/helia-http-gateway/issues/18
- [ ] https://github.com/ipfs/helia-http-gateway/issues/12
- [x] Ensure helia maintainers have access to a graph showing successful helia-http-gateway runs
- [x] Decide on how we want to handle running & displaying 3 separate helia-http-gateway setups with tiros
- [x] Get https://github.com/ipfs/helia-ipns/pull/55 merged and pulled into this repo so we can improve ipns queries.
- [x] Document the Helia configurations run on probelab.io: https://github.com/plprobelab/website/pull/85
- [x] migrate to @helia/verified-fetch -> https://github.com/ipfs/helia-http-gateway/pull/63
SgtPooki commented 1 year ago

@whizzzkid FYI that the service-worker gateway version should be secondary to the nodejs work.

whizzzkid commented 1 year ago

Thanks @SgtPooki, phase 1 is 95% there. Lemme send you a PR and open a chat with probelab to understand if ghcr would work.

whizzzkid commented 1 year ago

https://github.com/whizzzkid/helia-docker/pull/1

@SgtPooki can I get a review when you have a minute?

whizzzkid commented 1 year ago

Probelab PRs:

whizzzkid commented 1 year ago

@SgtPooki @BigLep I think the service worker gateway scope can be extended, a bit more planning can help us have a gateway in the browser, for:

Both companion and tiros can then directly hit helia.io. This could also open new pathways to init helia for use by anyone who wishes to retrieve content over ipfs in the browser. I have a very basic idea in the rough.

Possible Working

using helia.io to serve content to the end user

sequenceDiagram
    actor User
    participant Helia.io
    participant ServiceWorker
    User->>Helia.io: Get `/ip[fn]s/*`
    Helia.io->>Helia.io: Serve a static SW install page.
    Helia.io->>ServiceWorker: Activate SW, Intercept All `helia.io/ip[fn]s/*` requests
    ServiceWorker->>Helia.io: Activation Done
    Helia.io->>Helia.io: Refresh so that SW catches the page.
    Helia.io->>ServiceWorker: Fetch `helia.io/ip[fn]s/*`
    ServiceWorker->>User: Content is Served!

Notes:

Using helia.io to retrieve content on any website.

sequenceDiagram
    actor User
    participant Domain.com
    participant Helia.io
    participant ServiceWorker
    User->>Domain.com: Call
    Domain.com->>Helia.io: /init.js
    Helia.io->>Domain.com: Validate if SW exists otherwise, Load /install.html in <iframe>
    Domain.com->>Helia.io: Load /install.html
    Helia.io->>ServiceWorker: Activate SW, Intercept All `helia.io/ip[fn]s/*` requests
    ServiceWorker->>Helia.io: Activation Done
    Helia.io->>Domain.com: Notify Domain.com load completed.
    Domain.com->>Helia.io: Fetch `helia.io/ip[fn]s/*`
    ServiceWorker->>Domain.com: Serve Content
    Domain.com->>User: Content is Served!

Notes:

BigLep commented 12 months ago

Thanks for sharing @whizzzkid . I agree there is something here. I'd have to think with a clear head whether this should live under helia.io.

What I do think worth doing is showing the diagram of interaction for the Tiros case. Ideally the Probelab Tiros case has no dependency on software outside of what is distributed in the docker image. It shouldn't be impacted by whether helia.io is up for fetching /init.js or /install.html. I think we should keep that all local to the Tiros docker image for now and get it working.

Once we have worked out the kinds there, we look at expanding further.

BigLep commented 11 months ago

A few comments in circling back on this:

  1. General comments on "helia-docker" name and scope: https://github.com/ipfs/helia-docker/issues/12
  2. How can we get the phase 1 functionality deployed so we can start seeing retrievability numbers?
  3. I know currently we are only in phase 1 of using Helia in Node. Can we get numbers though for how much reliability/performance improves whether we enable "trustless" gateway fallback or not. Basically, I think we should run the docker in one configuration where trustless fallback isn't enabled, and one where it is.
  4. How are we doing with phase 2 (testing retrieving from the browser without any dependency on an external site like helia.io)? Is that work being tracked somewhere else or here?
  5. As part of phase 2, lets create a diagram like here, but without helia.io.
  6. I added a couple of subtasks to the description of this issue. Please add others that are relevant for tracking this completion.
SgtPooki commented 11 months ago
  1. How can we get the phase 1 functionality deployed so we can start seeing retrievability numbers?

deployed to probelab tiros and we're getting numbers now, but there are some remaining issues.

  1. I know currently we are only in phase 1 of using Helia in Node. Can we get numbers though for how much reliability/performance improves whether we enable "trustless" gateway fallback or not. Basically, I think we should run the docker in one configuration where trustless fallback isn't enabled, and one where it is.

We can get these numbers but would need to work with probelab on deploying a second helia version. we should probably make sure we resolve any crashing issues first

  1. How are we doing with phase 2 (testing retrieving from the browser without any dependency on an external site like helia.io)? Is that work being tracked somewhere else or here?

This is being tracked in this issue currently but we should spin that out.

  1. As part of phase 2, lets create a diagram like https://github.com/ipfs/helia/issues/275#issuecomment-1747965736, but without helia.io.

Sounds good.

SgtPooki commented 10 months ago

FYI that i'm moving the service worker discussions to https://github.com/ipfs/helia-http-gateway/issues/56 and out of this issue so we can call this done when the dockerized node-side version of helia-http-gateway is done

BigLep commented 10 months ago

Thanks @SgtPooki. To be precise and make sure we're on the same page, the remaining tasks for this issue to me are:

  1. (already in task list) Phase 1: Investigate helia-http-gateway crashing issue. - https://github.com/ipfs/helia-http-gateway/issues/18
  2. (already in task list) https://github.com/ipfs/helia-http-gateway/issues/12
  3. We need to have a graph showing want percentage of tiros runs succeed with Helia (vs. the other setups like HTTP and Kubo)
  4. Have Tiros reporting retrievability success and latency for these configurations:
    • Helia / NodeJS / trustless gateway only
    • Helia / NodeJS / delegated routing only + direct peer retrieval (no trustless gateway fallback)
    • Helia / NodeJS / trustless gateway AND delegated routing with direct peer retrieval

Agreed?

SgtPooki commented 10 months ago

@BigLep I'll have to chat with @dennis-tra about #4, but it's possible to execute those different scenarios now. Displaying those on the website may be a different story (the graph could get too crowded, but maybe I can help with the display website?

Also, there's a question of cost for running 3 helia scenarios instead of just 1.

SgtPooki commented 10 months ago

@BigLep FYI that all three separate instances are running in tiros now. Some updates:

  1. :+1: Containers don't seem to be dying early anymore. This is good
  2. :+1: Tiros jobs for heliatg are looking positive. See attached screenshot below
  3. :-1: Tiros jobs for helia (all things enabled) and heliadr (delegated routing only) aren't stopping. This is bad

image