Open lmolkova opened 9 months ago
Dropping some initial research results for Kafka here.
TL;DR: looking into kafka JMX metrics, 'native' metrics proposal, javadocs for client APIs, I can see the following properties that might be interesting on metrics, have low cardinality, no sensitive information, apply on the operation level (rather than transport).
node.id
(network.peer.address
? The IP address the final response was received from, if known on the logical level) client.rack
(like an availability zone and maybe fits better as a resource/scope attribute)enable.auto.commit
- receive modeOthers like client.id
(messaging.client_id
) would be controversial.
Documented in 'native' kafka metrics:
client.id
transactional.id
- per-client-instance configuration that plays role in transaction support. These are per-client instance and potentially have high cardinality. Reporting them on OTel metrics is controversial.
Documented in 'native' kafka metrics:
client.rack
. Availability zone-equivalentgroup.id
- same as messaging.kafka.consumer.group
group_instance_id
, group_member_id
- describe client/machine instances and don't fit otel metrics due to cardinality (partially covered by service.instance.id
)Consumer config
enable.auto.commit
. There are similar modes in other messaging systems when messages are settled upon successful delivery. Perhaps we can think if it's useful on metricsLinks:
adding research on RabbitMQ
TL;DR: pre-stability:
redelivered
and delivery_mode
attributes for RabbitMQ.consume
, get
, etc)can be added incrementally:
messaging.header.*
template similar to http headers.Java otel instr reports
rabbitmq.record.queue_time_ms
- time in queue for messagerabbitmq.command
(basic.publish
, basic.get
, queue.create
, exchange.declare
etc) - perhaps should be changed to messaging.operation
rabbitmq.queue
- perhaps should be changed to messaging.destination.name
rabbitmq.delivery_mode
- https://www.rabbitmq.com/consumers.html#message-propertiesmessaging.header.*
templatePython otel doesn't report anything new
RabbitMQ message properties that could be interesting on metrics (low cardinality, logical level):
Redelivered
bool flag is message was delivered beforeDelivery mode
(persistent vs transient message)type
- Application-specific message type, e.g. "orders.created"Content type
and Content encoding
- e.g. "application/json" and "gzip". Used by applications, not core RabbitMQCollector rabbitmq scraper does not report any attributes.
Client JMX metrics don't seem to have any attributes
we should also check rabbitmq/kafka attributes to see if any of the existing ones are good candidates to be generalized
Based on discussions in #798, we should also look into standardizing messaging.kafka.message.offset
.
Discussed in the messaging workgroup: we'll need a RabbitMQ-specific extension for metrics because the routing key is needed on RabbitMQ metrics.
This issue is related: https://github.com/open-telemetry/semantic-conventions/issues/1156
messaging.message.redelivered
(bool) - low cardinality, supported in many systems
Based on the messaging SIG discussion on 2/15/24, we should target stabilizing Kafka and RabbitMQ semconv along with the messaging semconv stability.
While we can always add span attributes, but adding attributes to the metrics is breaking #722.
So prior to stability, we should
Update: see https://github.com/open-telemetry/semantic-conventions/pull/798#discussion_r1516427699