ipshipyard / waterworks-community

Discussion and documentation concerning the operation of the Public Goods for IPFS and Libp2p.
https://docs.ipfs.tech/concepts/public-utilities/
MIT License
2 stars 1 forks source link

Minimum graphs needed for top-level health reporting on the ipfs.io gateway #5

Open BigLep opened 9 months ago

BigLep commented 9 months ago

Background

IP Shipyard has been entrusted to steward the ipfs.io gateway. Other leaders in the ecosystem should have the ability to see the health and usage of the ipfs.io gateway. This issue is about defining the minimum graphs needed to give others confidence in the maintained health of the service.

Graphs

Some general requests includes:

Unique Clients accessing ipfs.io / dweb.link

Current source: https://probelab.io/ipfsgateways/#daily-unique-clients-accessing-ipfsio--dweblink Snapshot: gateway-clients-overall Improvements needed:

HTTP Requests to ipfs.io / dweb.link, by region

Current source: https://probelab.io/ipfsgateways/#daily-http-requests-to-ipfsio--dweblink-by-region Snapshot: gateway-requests-region Improvements needed:

p95 of TTFB for “200” responses

Current source: none currently other that a weekly snapshot value in https://protocollabs.grafana.net/d/J2_IHYTVz/gateway-report?orgId=1 . I'm also not sure if that value is including "200" responses or all responses. Existing data: in https://docs.google.com/spreadsheets/d/1qnrAhqt_i5l9m48jge6617XD0hRK4qbTebTxWKhJdV0/edit#gid=1875197224 there is image. That said, I don't know if that is for "200" responses or all responses.

What's needed:

Response code distribution

For the requests in a given week, we should be able to show how the gateway is responding.

Why:

  1. Catch if there is a deployment issue that is affecting traffic.
  2. Prove the value of certain functionality.

Example looking at the last 7 days:

image

The high 410’s emphasizes the importance of “Badbits”. If we didn’t have it, the majority of requests would be served offering content we don’t want to serve.
If this distribution were ever to change (e.g., “badbits” was disabled) that would be bad and we’d want to see it.

Current source: none currently other that a weekly snapshot value in https://protocollabs.grafana.net/d/J2_IHYTVz/gateway-report?orgId=1

What's needed:

Unique CIDs requested per week

Why: Gives a sense of how much of the content addressable space is being requested through the ipfs.io gateway.

What's needed:

BigLep commented 9 months ago

If it's helpful, I am happy to break this into multiple issues.

BigLep commented 9 months ago

Probelab website request for things that I think probelab could execute on independently: https://github.com/probe-lab/website/issues/87 . That said, I would want the "Waterworks" crew to followup, review, or make any corrections given they're the stewards of this community infrastructure.