Open iinuwa opened 3 weeks ago
@lukebakken sounds like a reasonable last minute 7.0 change? 😢
@iinuwa you are welcome to submit a PR that would rename the attribute. Whether it can ship in 7.0.0, I don't know. It sounds like the right moment to do it, since this feature is very new (in general and new in 7.x).
I'm not sure if there's a good place to add this in this library though
That would be actually quite useful if supported built-in as some generic .net metrics.
It's surprising that messaging.operation.name
reflects an operation type and messaging.operation.type
is… for I'm not sure what, a different type.
In any case, renaming a metric for 7.0 sounds acceptable. Introduction of an another metric will have to go into 7.1, you are welcome to submit PRs.
Thanks Michael.
It's surprising that messaging.operation.name reflects an operation type and messaging.operation.type is… for I'm not sure what, a different type.
Yeah, I'm kind of confused too. I think type
is supposed to be generic/constrained for creating dashboards across different messaging systems and name
can be whatever you want. 🤷🏿♂️
In any case, renaming a metric for 7.0 sounds acceptable.
Added #1716 for changing the attribute name to messaging.operation.type
. I also updated the corresponding attribute and operation name values to non-deprecated values since it seemed related.
OpenTelemetry support was originally contributed in https://github.com/rabbitmq/rabbitmq-dotnet-client/pull/1528 by @stebet, @lmolkova and others. Perhaps we should also ask for their opinion.
I am pinging these people because they all reviewed #1528 -
@danielmarbach
@lmolkova
@martinjt
@Kielek
@cijothomas
@WeihanLi
It's surprising that messaging.operation.name reflects an operation type and messaging.operation.type is… for I'm not sure what, a different type.
Yeah, I'm kind of confused too. I think type is supposed to be generic/constrained for creating dashboards across different messaging systems and name can be whatever you want.
This is exactly the point. We want systems to have operation names that use their own terminology, but we also need unified type that's common across different systems. The latter one is useful for common dashboards, alerts and helps to move between messaging systems.
There could also be multiple different operation names with the same type. E.g. Azure ServiceBus has receive
, peek
and receive_deferred
operation names which all map to receive
type. Or there could be some operations like admin ones that have a name, but don't have a type since it does not make much sense to unify them across systems.
I hear you both that it's confusing. Let me know if you have any thoughts on how to make it less confusing :)
PS: will take a look at https://github.com/rabbitmq/rabbitmq-dotnet-client/pull/1716
I'll take a look this morning, thanks!
@lmolkova this is probably not the right place to discuss this but messaging.operation.type
seems to cover both messaging and topology operations, which is not wrong per se but then why does it have create
but not delete
? If you declare a queue or a stream, they won't necessarily live forever.
As for .name
, it is a parallel track of operations. If I was designing this today,
I'd use
messaging.operation.type
: create
| delete
| publish
(or send
) | deliver
(or receive
) | ack
| nack
| message_return
(a message was returned) | consumer_registration
| consumer_cancelation
messaging.operation.subtype
then would replace messaging.operation.name
as a field that can contain arbitrary informationIn any case, our team understands that it's not a decision our team will probably influence a great deal, so we will follow the current naming scheme even if it leaves a bit to be desired.
Thank for the feedback! You can actually influence our design quite a lot - this is a perfect time for it (we plan to stabilize sometime soon and after that breaking changes will have very high bar).
why does it have
create
but notdelete
. If you declare a queue or a stream, they won't necessarily live forever.
The messaging.operation.type
only covers operations for which we see a strong need for unification. E.g. you create a dashboard that shows messaging throughput. You can get number of incoming messages by filtering by operation type == send
(plus any other dimensions) - you can do this whether you use RabbitMQ or Active MQ.
When tracing backends visualize stuff, they need to know some things about these special operations to visualize them nicely.
But also, if I use e.g. Azure ServiceBus, having a common name send
is not covering all cases. e.g. I can schedule a message, so I need something else to show the real operation name.
When it comes to queue/consumer deletion - these operations seems to be less of an interest when it comes to visualizations/common dashboards at least at the moment.
I.e. you can have an arbitrary, broker-friendly name and not having a type defined won't hurt anyone. We can definitely grow a list of recognized operation types over time and include things like consumer_registration
.
As for .name, it is a parallel track of operations. If I was designing this today,
Do you feel that the messaging.operation.name
is confusing in some way? Why would you prefer the messaging.operation.subtype
?
we picked this name because it aligns well with similar notions we have for other technologies (e.g. we have db.operation.name
or gen_ai.operation.name
). It's more than a subtype - that's actually the primary thing people who use your library/system know about their operation.
messaging.operation.type: create | delete | publish (or send) | deliver (or receive) | ack | nack | message_return (a message was returned) | consumer_registration | consumer_cancelation
That's where things get tricky with operation types. They need to apply to most of the messaging systems. E.g. ack/nack/checkpoint/commit/completion/abandonment/dead lettering are all forms of settlement. For this one we want to unify since it's such an important part (you want to know how message processing has ended).
As I mentioned above, consumer_registration|cancellation|etc
could be added incrementally.
In any case, happy to hear your thoughts
@lmolkova for RabbitMQ specifically, high topology churn (due to questionable client library usage or design choices in applications) has been a serious enough headache over more than a decade. So declaration and deletion events are an important aspect to monitor, not just message publishing or consumption rates.
These days RabbitMQ has three related metrics specifically so that operators can spot topology churning applications. Perhaps OpenTelemetry for messaging should provide a couple of well-known operation types.
Do you feel that the messaging.operation.name is confusing in some way? Why would you prefer > the messaging.operation.subtype ?
Only because to me an operation does not have a name, it has a type (or method/frame, in messaging protocol nerd speak).
It's more than a subtype - that's actually the primary thing people who use your library/system know about their operation.
The docs do not state that strongly enough.
They need to apply to most of the messaging systems. E.g. ack/nack/checkpoint/commit/completion/abandonment/dead lettering are all forms of settlement. > For this one we want to unify since it's such an important part (you want to know how message processing has ended).
I understand that and I've tried to not use terms that would be too specific to RabbitMQ or its original protocol. "settlement" from AMQP 1.0 covers a lot of ground but must be clarified, hence the .name
.
The current naming, with the above explanation, should be enough for us to continue. But at the very least the purpose of .name
must be clarified in the docs, with a few examples for AMQP 1.0, MQTT, AMQP 0-9-1, RabbitMQ Stream and Kafka protocols.
I'm willing to provide a few examples for the first four in case you accept external contributions :)
So my current understand is the following:
messaging.operation.type
` is the broadest type of classification: was some data sent? received/delivered? was something "created" (I guess "topology" would be a more generic name)messaging.operation.name
is for the specific protocol operation (method), e.g. publish
for MQTT AMQP 1.0, publish
or basic.publish
for AMQP 0-9-1, ack
or nack
or settle
for pretty much every protocol out there, or queue.delete
for systems where topology is entirely application-managedThis client can work with this. Thank you for the clarification @lmolkova.
Thanks a lot for the details @michaelklishin!
I'll go through the points you raised and convert them into work items in semantic conventions later today (or if you want to do it - go ahead)
I'm willing to provide a few examples for the first four in case you accept external contributions :)
we do accept contributions and really appreciate them at https://github.com/open-telemetry/semantic-conventions - lmk if you need any help with it.
@lmolkova I can only use examples from the five protocols supported by RabbitMQ, including AMQP 0-9-1 which one OTel "committee big shot" at would vehemently object to. If you think that's still a good idea, should I submit a PR with updates of this file specifically? https://github.com/open-telemetry/semantic-conventions/blob/14c1d79ade64a5fddccf337fc2b00e750e110c59/model/messaging/common.yaml#L18
Thanks for the discussion @michaelklishin and @lmolkova! I have some separate considerations
Now that I'm looking at this a bit more, I wonder if we should just leave this as is for 7.0. I don't want to delay the release of this client, since the async and performance changes are much more important, and @lukebakken has already indicated that adding new attributes should be pushed to 7.1. I'm thinking we should defer these operation name and attribute name and value changes to 7.1 and make sure to coordinate operation names with other major libraries; at least the ones that the RabbitMQ team officially supports.
A caveat: changing the operation and attribute names are technically breaking changes, since it will mess with people's dashboards. But the RabbitMQ and Messaging Semantic Conventions are marked as experimental, so perhaps it would be OK to break this in a minor RabbitMQ client version. (Maybe it'd be good to denote that in the release notes that OpenTelemetry schema in the RabbitMQ .NET client is also experimental and subject to change.)
A quick sample of operation names for a few clients:
Language | Library | Built-In/Wrapped | RabbitMQ-Supported | Values |
---|---|---|---|---|
Java | rabbitmq-java-client | Wrapped | Yes | send , receive |
Node | amqplib | Wrapped | No | publish , consume , ack , nack , reject |
Python | pika | Wrapped | Yes | send , receive |
Go | amqpgo | None* | Yes | (none yet, in progress) |
Erlang | rabbitmq-erlang-client | None | Yes | N/A |
PHP | php-amqplib | None | Yes | N/A |
All of these use the old messaging.operation
attribute (except the Go one, which is still in progress).
My recommendations for the names would be to use the AMQP operation names, stripping basic. : |
AMQP/RMQ operation | OTEL Operation Type | OTEL Operation Name | Description |
---|---|---|---|---|
basic.publish |
send |
publish |
sending a message | |
basic.deliver |
process |
deliver |
receiving a message from a subscription initiated by basic.consume |
|
basic.get |
receive |
get |
pulling an message with basic.get |
|
basic.consume |
(none) | consume |
start a subscription with basic.consume |
|
basic.ack |
settle |
ack |
Mark a message as processed | |
basic.nack |
settle |
nack |
return a message back to broker | |
basic.reject |
settle |
reject |
return a message back to broker |
(Alternatively, we can replace periods .
with underscores _
, e.g. basic_get
. But this is lengthy, and I don't think AMQP will add another prefix in the future, so I prefer the short one. It's also closer to what the RabbitMQ and messaging ecosystem at large is using)
@iinuwa we will accept the changes you have submitted, they are small and bring us up-to-date with the current state of OpenTelemetry naming conventions. All other changes will have to wait until 7.1.
@michaelklishin, OK. I'll split off the other work into a separate PR then. Thanks!
@iinuwa by the way, the idea of stripping basic.
for operation names sounds good to me except for basic.get
which is explicitly called out in the docs and management UI, for several reasons. Alternatively we can use "polling fetch" for operation name.
Is your feature request related to a problem? Please describe.
The Activities in RabbitMQ client currently emit the
messaging.operation
attribute. However, the Semantic Conventions for messaging currently specify usemessaging.operation.type
for the values that are currently being set there, and to addmessaging.operation.name
to values likeack
,nack
, etc.Describe the solution you'd like
At minimum, changing the
messaging.operation
attribute tomessaging.operation.type
would be helpful to comply with the conventions.Adding
messaging.operation.name
would also be helpful for tracking the number of/searching foracks
vs.nacks
in the system. I'm not sure if there's a good place to add this in this library though; perhaps it would need to be done in higher level libraries.Additional context
References https://github.com/rabbitmq/rabbitmq-dotnet-client/pull/1528.