open-telemetry / opentelemetry-specification

Specifications for OpenTelemetry
https://opentelemetry.io
Apache License 2.0
3.73k stars 888 forks source link

Messaging consumer/client (group) ID could be made more generic #2015

Closed Oberon00 closed 1 year ago

Oberon00 commented 3 years ago

Currently we have these semantic span attributes to identify message senders/consumers:

It seems like these could all be merged into a generic messaging.client_id and messaging.client_group.

See also https://github.com/open-telemetry/opentelemetry-specification/pull/1904#discussion_r701054533 and https://github.com/open-telemetry/opentelemetry-specification/pull/1810#discussion_r667463380.

kenfinnigan commented 3 years ago

This would be a good candidate to include in the messaging WG discussions. @pyohannes?

pyohannes commented 1 year ago

For easier reference, here's an updated list of attributes pertaining to this discussion:

A consumer group usually defines a logical "view" of a topic or similar kind, whereas a client id uniquely identifies a running instance. I think it makes sense to use separate terms (client/consumer) here.

As far as I understand, a client_group for RocketMQ is similar to a consumer group and should probably be renamed to be consistent. I'm not a RocketMQ expert though. That said, attributes in the global messaging namespace are supposed to be applicable to all or most messaging systems, and the concept of a consumer (or client) group is not, as it only applies to checkpoint-based messaging systems (or, in other words, to "topics" but not to "queues"). Therefore, I think those attributes should be kept in system-specific namespaces.

Client id, on the other hand, could be applicable to any messaging system, so I think it would make sense to move it to the generic messaging namespace.

The consumer.id mixes both concepts of consumer groups and client ids and can have different semantics depending on the messaging system (for Kafka it could just be a consumer group id, for RabbitMQ it can be a client id). I wonder if we shouldn't remove the attribute in order to achieve a clear separation and clean semantics. No information would be lost in any case, as the client id and the consumer group that make up the consumer.id attribute are present in other separate attributes.

To summarize, I'd suggest replacing the above-mentioned attributes with the following:

pyohannes commented 1 year ago

@kenfinnigan As you introduced the messaging.consumer.id attribute, could you have a look at the proposal in the previous comment and let us know if that would make sense for you?

kenfinnigan commented 1 year ago

The one concern I have is identifying Kafka messages generically by messaging.client_id when there is no client id for the Kafka consumer, as the field would be empty. This was one of the reasons why for Kafka the messaging.consumer.id is either a combination of consumer group and client id, or only consumer group.

pyohannes commented 1 year ago

The one concern I have is identifying Kafka messages generically by messaging.client_id when there is no client id for the Kafka consumer, as the field would be empty.

Do you think that in this instance, it would be feasible to look at a tuple of attributes: messaging.client_id, and messaging.kafka.consumer.group?

The problem with the current definition of consumer.id is, that is neither uniquely identifies a client, nor are semantics consistent across messaging systems.

kenfinnigan commented 1 year ago

We can consider a tuple of attributes for Kafka, but we then have the same problem in that messaging.client_id on its own is not sufficient to uniquely identify a client, as there are caveats for some systems, granted possibly only applying to Kafka

pyohannes commented 1 year ago

[...] but we then have the same problem in that messaging.client_id on its own is not sufficient to uniquely identify a client, as there are caveats for some systems, granted possibly only applying to Kafka

True, it doesn't uniquely identify a client, because it might be missing for some cases. However, the semantics are clearly defined across systems: if it is given, then it uniquely identifies a client.

As far as I understand, the current consumer.id can be misleading in cases where it falls back to the consumer group, as different clients using the same consumer group will have the same consumer.id.

If there are no strong blocking reasons from your side, I will submit a PR with the proposed changes above.