Open dashpole opened 2 years ago
In the meantime, while we wait for metrics to be stable enough for this, I created this: https://github.com/MrAlias/flow
@dashpole let me know if that helps.
Very cool. I'll take a look
It probably isn't quite enough to meet the needs I have, but may be useful for others
we also need this.
@MadVikingGod is going to look into what metrics should be added and the feasibility of this feature.
Java does implement some metrics around the BatchSpanProcessor (BSP) and a generic wrapper for some (at least grpc, maybe more) exporters. The metrics below will indicate if I found them in Java.
To experiment with it and not include any API surface we can start with an experimental Environment Variable. This will indicate if we should use the global Metrics API. Doing this should allow us to explore the performance impact of any of the metrics while still maintaining compatibility.
This would add a number of WithMeterProvider()
to anywhere that would produce these metrics. This could either act as an enable signal, only capture metrics if it's configured or an override signal, override using the global API.
This can realistically only be done for Objects that already use an option pattern, like the TracerProvider
or the BatchSpanProcessor
, which would prevent some components from having an override, like the SimpleSpanProcessor
. We won't need an option for Samplers, because we can measure the output of this decision without instrumenting the internals of this code.
If we were to add an option for both TP and BSP, this would mean we would need a new type that is the union of both Options, similar to SpanStartEventOption
This is a non-exhaustive list of things that could be captured
From the Tracer
From an exporter
We're very interested in this feature so we can tune our Batcher to ensure it doesn't inadvertently drop spans. I started a WIP PR (#5201), but it definitely needs some guidance. If there are already plans to release metrics in the near-to-mid-term, we can wait, but otherwise, this seemed like a well-scoped area where we could help contribute (especially using the Java implementation as a reference).
Discussed this at the in-person SIG meeting @ kubecon. We should
Instrumentation should default to the global meterprovider/tracerprovider, but also accept a WithMeterProvider/WithTracerProvider option that overrides the global (similar to a typical instrumentation library).
Another thought:
The approach for adding metric and trace instrumentation should be:
Can we start by adding this as an experimental feature. That can be used to help progress semantic convention work without blocking this.
This assumes the experimental approach would just use the global providers.
We could start with metrics similar to the ones in the java SDK:
Problem Statement
Context: https://github.com/kubernetes/enhancements/pull/3161#discussion_r790716355
For an application instrumented with OpenTelemetry for tracing, and using the OTLP trace exporter, it isn't currently possible to monitor (with metrics) whether or not spans are being successfully collected and exported. For example, if my SDK cannot connect to an opentelemetry collector, and isn't able to send traces, I would like to be able to measure how many traces are collected, vs how many are not sent. I would like to be able to set up SLOs to measure successful trace delivery from my applications.
Proposed Solution
After the metrics API is stable, collect metrics in the trace SDK using the metrics API. Specifics about the metrics deserve their own design, but I should be able to tell the volume of spans my application is generating, and the success rate of exporting them. This would be done via a new
TracerProviderOption
:WithMeterProvider(MeterProvider)
.Alternatives
We could add metrics to exporters individually, but most exporter-related metrics should be similar.