OTEL: Debugging exporters

gouthamve commented 1 year ago

Request

Currently there is no good way to understand the signals flowing through the Agent. In the Collector, you could do it via the prometheusexporter ~and loggingexporter~.

We should add them to the Agent.

Use case

This will help our users debug whats flowing through the pipeline and understand whats attributes are being carried by the signals.

cyrille-leclerc commented 1 year ago

Logging exporter is already available. I would love to have the Grafana Agent use the Prometheus Exposition Format to help debugging. As Grafana Agent doesn't use the OTel Collector's Remote Write Exporter but its own solution, I wouldn't be surprised if it was the same for the Prometheus exposition format

ptodev commented 1 year ago

I think the Flow UI already helps to debug attributes being carried by signals. Is there a particular feature which the UI is missing which could be useful for this debugging use case?

cyrille-leclerc commented 1 year ago

Thank @ptodev.

I checked the `Flow UI and didn't see what could help me list the collected metrics, did I miss something to review the collected metrics?
I think that's it's pretty common for collector configurers to use the text outputs of their collectors to debug problems in their pipelines

ptodev commented 1 year ago

I checked the `Flow UI and didn't see what could help me list the collected metrics, did I miss something to review the collected metrics?

I think you're correct - there is no way to see the exact metrics at the moment. It is possible to see thing like labels, scrape timestamp, and latest scrape duration. It's very similar to the Prometheus UI's Targets page. I was hoping that the ability to see labels/attributes would be enough in a lot of situations, but I agree it's better to see the exact metrics sometimes.

We should almost certainly add something like the Collector's Prometheus Exporter not just for debugging, but also for other reasons. For example, it would allow customers to chain agents by having an Agent scrape another Agent. At the moment Agents can only be chained together by having one Agent remote writing to another.

That said, it might be worth thinking about exposing information about the exact signals via the UI at some point:

It'd allow people to see their metrics without having to change their config and expose them in a debugging /metrics endpoint
It could be an advantage over other collector distributions which don't have that feature
It could work for other signals such as traces, where AFAIK it's not possible to expose them on a pull-based endpoint like the Prometheus one

cyrille-leclerc commented 1 year ago

We should almost certainly add something like the Collector's Prometheus Exporter not just for debugging, but also for other reasons. For example, it would allow customers to chain agents by having an Agent scrape another Agent. At the moment Agents can only be chained together by having one Agent remote writing to another.

💯 IMO Prometheus Exposition `format is a must have for a data collector

That said, it might be worth thinking about exposing information about the exact signals via the UI at some point:

I would defer this decision. I wouldn't be surprised if Prometheus exposition format + console output of logs and traces was enough for some time

rfratto commented 1 year ago

I'm not sure I understand why the logging exporter isn't sufficient, and why we would also need a page to dump all received metrics for debugging; my concern would be the memory overhead and how readable a page would that would be for large installations.

Can you help me understand the requirements around seeing everything on one page at the same time?

cyrille-leclerc commented 1 year ago

Thanks @rfratto . You are right it may be counter intuitive but the Prometheus Exposition Format is such an industry standard that it's the first visualization that comes to my mind to troubleshoot metrics collection.

rfratto commented 1 year ago

Thanks @rfratto . You are right it may be counter intuitive but the Prometheus Exposition Format is such an industry standard that it's the first visualization that comes to my mind to troubleshoot metrics collection.

Wouldn't Grafana be a more intuitive and useful tool since you could visualize the metrics over time?

To phrase that a better way: What kind of troubleshooting would you be doing where the metrics wouldn't also already be in Grafana?

cyrille-leclerc commented 1 year ago

Wouldn't Grafana be a more intuitive and useful tool since you could visualize the metrics over time?

I have experienced this in a past life, for logs ingestion pipelines :-) Having to query the database to verify my ingestion pipeline is way more complicated than verifying on the "debug mode" of a collector.

What kind of troubleshooting would you be doing where the metrics wouldn't also already be in Grafana?

I want to verify the produced metrics and their labels.

It's complicated in Prometheus/Mimir/Grafana to "list all the metrics produced by a given exporter and only those ones"

I don't know how to "list all metrics"
How can I be sure that I am listing
- all the metrics produced by an exporter
- only the metrics produced by this exporter if I'm not sur of the produced label values (eg instance)
  - My Prometheus/Mimir/Grafana instance always receives metrics from multiple sources, I can never ensure I disabled all the other data sources
Grafana is not friendly to copy in a github issue a list of metrics with their labels

grafana / alloy

OTEL: Debugging exporters #424

Request

Use case