Open lmolkova opened 11 months ago
- recommend a default way to scrape metrics from kafka
For broker metrics, this could be the receiver implemented for the collector, it already maintains a list of supported metrics.
For client metrics, Kafka takes an approach similar to Kubernetes:
- Metrics are intended to be collected by default, serialized to OTLP and sent to broker from which they can be collected by users - https://github.com/apache/kafka/pull/14620
I don't think OTel should start to tackle the problems that arise from this, even more so as it's not fully implemented and working yet, and many details are still unclear.
As we now have generic messaging metrics defined (albeit experimental), we should rather build on those where possible. Which means, seeing whether we can map to those metrics, and define Kafka-specific extensions where needed.
@pyohannes the suggestion here is not to solve a big problem but reduce inconsistency for non-standard set of metrics so different instrumentations emit similar things.
For example just document them once like Java does for kafka library - https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/bbfe950ad0ace8123a5e6817fb3767e27a1a2cee/instrumentation/kafka/kafka-clients/kafka-clients-2.6/library/README.md.tTis documentation can be an informative one and live on opentelemetry.io
Some of them are based on monkey-patching/byte-code rewriting and can emit otel-compatible metrics and traces
Just a small clarification. The java kafka client metrics solution works by bridging metrics from the kafka client library using API hooks provided by the library. No monkey patching / bytecode rewriting required. The code performs a minimal generic mapping between the instrument types, metric names, and attribute names. It does not cherry pick metrics from kafka and try to conform to any particular conventions - that approach was ruled out because it was too brittle and too time consuming given the sheer number of instruments exposed (> 200 IIRC).
For example just document them once like Java does for kafka library - [...]
That's fine for me, as long as what we document doesn't conflict with the generic messaging metrics that we have.
discussed at Semconv WG meeting on 12/4.
Next steps:
Given we removed them from conventions here https://github.com/open-telemetry/semantic-conventions/pull/338, do we really need to do anything here?
Given we removed them from conventions here open-telemetry/semantic-conventions#338, do we really need to do anything here?
@joaopgrassi we still want to document them and the semconv WG decision was to have an informative section on opentelemetry.io, so I transferred issue
Thanks for transferring this issue @lmolkova . Since this is a first instance of something like that being documented, we need to figure out where and how to put this within the docs. To be honest right now I am not sure what the best place will be, any suggestions?
@svrnm I wonder if we can add a page under Semantic Conventions, something like "External conventions" where we would be able to provide documentation about non-otel-authored/compliant signals Otel collector/instrumentation libraries emit.
E.g.
Semantic Conventions
External Conventions
Kafka
I believe there are more candidates to be in that folder (looking into collector receivers, there are plenty of scrapers (Redis, RabbitMQ, ...) that don't always document metrics. Ideally, we want them to at least add a link to external documentation.
As an alternative, we could consider adding a section under "Collector" since most of this external conventions will come through it and then, in rare cases they are needed outside of the collector (like in java-instrumentation), we could just link the section in the Collector.
WDYT?
Why aren't they following semconv?
I would prefer this:
add a page under Semantic Conventions, something like "External conventions"
Since it's consistent with where we keep naming for common components.
Why aren't they following semconv?
Kafka specific ones we want to find home for are legacy ones from pre-otel world (which Kafka owners AFAIK want to preserve for the time being).
Why aren't they following semconv?
Kafka specific ones we want to find home for are legacy ones from pre-otel world (which Kafka owners AFAIK want to preserve for the time being).
Ah, ok.
Kafka specific ones we want to find home for are legacy ones from pre-otel world (which Kafka owners AFAIK want to preserve for the time being).
Is there a discussion that we can reference for that? Or, asked differently: have we (opentelemetry community) actively engaged in a conversation with them (kafka community) if this is they right way forward? Not that we can tell them what to do but we can at least help (if wanted) to make an inform decision
Context
Effectively, Kafka project does not plan to follow OTel semconv (@AndrewJSchofield to confirm)
OTel provides several kafka instrumentation components:
The problem:
Group 1 (monkey-patched instrumentations) might still want to emit kafka-specific metrics/traces. We'll need to keep them in otel-semconv repo so they are consistent across languages/clients.
Group 2 (instrumentations that report what's available) have more difficult problems: There are multiple ways to scrape different sets of metrics from Kafka:
These metrics in most cases can't be converted to OTel ones (use different instruments, don't support histograms, don't report the same attributes, etc).
As a result, we're going to end up with each language SIG (plus external components) defining their own set of custom metrics for Kafka based on what they have.
What we can do on otel semconv side:
kafka
and notmessaging.kafka
as we do in OTel semconv)