grafana / alloy

OpenTelemetry Collector distribution with programmable pipelines
https://grafana.com/oss/alloy
Apache License 2.0
1.38k stars 201 forks source link

OTEL: Debugging exporters #424

Open gouthamve opened 1 year ago

gouthamve commented 1 year ago

Request

Currently there is no good way to understand the signals flowing through the Agent. In the Collector, you could do it via the prometheusexporter ~and loggingexporter~.

We should add them to the Agent.

Use case

This will help our users debug whats flowing through the pipeline and understand whats attributes are being carried by the signals.

cyrille-leclerc commented 1 year ago

Logging exporter is already available. I would love to have the Grafana Agent use the Prometheus Exposition Format to help debugging. As Grafana Agent doesn't use the OTel Collector's Remote Write Exporter but its own solution, I wouldn't be surprised if it was the same for the Prometheus exposition format

ptodev commented 1 year ago

I think the Flow UI already helps to debug attributes being carried by signals. Is there a particular feature which the UI is missing which could be useful for this debugging use case?

cyrille-leclerc commented 1 year ago

Thank @ptodev.

  1. I checked the `Flow UI and didn't see what could help me list the collected metrics, did I miss something to review the collected metrics?
  2. I think that's it's pretty common for collector configurers to use the text outputs of their collectors to debug problems in their pipelines
ptodev commented 1 year ago

I checked the `Flow UI and didn't see what could help me list the collected metrics, did I miss something to review the collected metrics?

I think you're correct - there is no way to see the exact metrics at the moment. It is possible to see thing like labels, scrape timestamp, and latest scrape duration. It's very similar to the Prometheus UI's Targets page. I was hoping that the ability to see labels/attributes would be enough in a lot of situations, but I agree it's better to see the exact metrics sometimes.

We should almost certainly add something like the Collector's Prometheus Exporter not just for debugging, but also for other reasons. For example, it would allow customers to chain agents by having an Agent scrape another Agent. At the moment Agents can only be chained together by having one Agent remote writing to another.

That said, it might be worth thinking about exposing information about the exact signals via the UI at some point:

cyrille-leclerc commented 1 year ago

We should almost certainly add something like the Collector's Prometheus Exporter not just for debugging, but also for other reasons. For example, it would allow customers to chain agents by having an Agent scrape another Agent. At the moment Agents can only be chained together by having one Agent remote writing to another.

💯 IMO Prometheus Exposition `format is a must have for a data collector

That said, it might be worth thinking about exposing information about the exact signals via the UI at some point:

I would defer this decision. I wouldn't be surprised if Prometheus exposition format + console output of logs and traces was enough for some time

rfratto commented 1 year ago

I'm not sure I understand why the logging exporter isn't sufficient, and why we would also need a page to dump all received metrics for debugging; my concern would be the memory overhead and how readable a page would that would be for large installations.

Can you help me understand the requirements around seeing everything on one page at the same time?

cyrille-leclerc commented 1 year ago

Thanks @rfratto . You are right it may be counter intuitive but the Prometheus Exposition Format is such an industry standard that it's the first visualization that comes to my mind to troubleshoot metrics collection.

rfratto commented 1 year ago

Thanks @rfratto . You are right it may be counter intuitive but the Prometheus Exposition Format is such an industry standard that it's the first visualization that comes to my mind to troubleshoot metrics collection.

Wouldn't Grafana be a more intuitive and useful tool since you could visualize the metrics over time?

To phrase that a better way: What kind of troubleshooting would you be doing where the metrics wouldn't also already be in Grafana?

cyrille-leclerc commented 1 year ago

Wouldn't Grafana be a more intuitive and useful tool since you could visualize the metrics over time?

I have experienced this in a past life, for logs ingestion pipelines :-) Having to query the database to verify my ingestion pipeline is way more complicated than verifying on the "debug mode" of a collector.

What kind of troubleshooting would you be doing where the metrics wouldn't also already be in Grafana?

I want to verify the produced metrics and their labels.

It's complicated in Prometheus/Mimir/Grafana to "list all the metrics produced by a given exporter and only those ones"