protocol / prodeng

Issues, discussions and documentation from the production engineering team
2 stars 1 forks source link

Thunderdome: Grafana Integration #17

Closed iand closed 2 years ago

iand commented 2 years ago

What Is It?

Send metrics and traces from Thunderdome experiments to Grafana cloud

Why Are We Doing It?

Integrating with Grafana gives us visibility of the performance of the gateways under test and gives us tools to analyse and compare results. Kubo exports dozens of metrics that report the health and operation of its subsystems such as bitswap and blockstore. We also need to see the results of the experiment from the users point of view such a the number of failed requests, the time to first byte and the overall response time. We chose Grafana cloud since we use it for most of our other infrastructure monitoring but other users could send to private Grafana instances.

Notes

Appropriate config for grafana agent to run as a sidecar alongside the gateway process. Pushing metrics, logs and tracing data to hosted Grafana with a dashboard to show the result of the experiments. Use dealgood’s vantage point to send clientside metrics and traces to Grafana

Project overview is on Notion

Tasks

JesseXie commented 2 years ago

@iand please link the Create and share demo of this phase to https://github.com/protocol/prodeng/issues/23

iand commented 2 years ago

Propogation of tracing headers in Kubo is waiting for stewards to review