openzipkin / brave

Java distributed tracing implementation compatible with Zipkin backend services.
Apache License 2.0
2.36k stars 714 forks source link

[Feature Request] Concat Topic/Queue name to Remote Service on Messaging instrumentations #895

Open jeqo opened 5 years ago

jeqo commented 5 years ago

Motivation:

Currently remote service name is by default the name of the broker technology (e.g. kafka, jms). If many messaging applications are instrumented, then service dependency graph all applications will be around the broker platform, which makes it difficult to understand dependencies.

Proposed change:

In messaging brokers, topics/queues could be represented as services, then instead of remoteServiceName=kafka, we could have something like remoteServiceName=kafka-topic-name, where only producers and consumers to this topic will be around.

A new boolean option (false by default) can be added to messaging instrumentation builders: concatTopicName for Kafka, and concatDestinationName for JMS.

If true, remoteServiceName=remoteServiceName-topicName

codefromthecrypt commented 5 years ago

I think our default naming is similar for mysql, though we don't include any sort of append logic based on user input, rather if they don't put anything, the default is a pattern like this.

I'm tempted to make this a pattern, but I can see the value in just concatenating as then you don't need to worry about variable naming of queue vs topic etc.

shakuzen commented 5 years ago

The analog of topic/queue to RPC tracing seems to be method/endpoint, which we generally put in the span name, not the service. I know it isn't an apples-to-apples comparison, but I'm wondering why we wouldn't prefer to do that with messaging as well.

Remote service name to me seems better suited to identifying a kafka/rabbitmq cluster than a topic/exchange. The dependency graph was mentioned in the description, and for using this in the dependency graph, topic/queue seems quite high cardinality (like HTTP endpoints would likely be) compared to a cluster identifier. I would also think people would want to identify broker clusters because if there is a failure/outage in the cluster it will affect all topics/exchanges, in general.

jeqo commented 5 years ago

@shakuzen the main difference between messaging to RPC is that we don't have the server (broker) span to name it. To get there, we would need to instrument the broker side.

Dependency graph is the main driver to me: is true that we will get more services (x #topics) but instead of getting all producers/consumers pointing to a cluster, we will get producer/consumer per topic pointing to cluster+topicName instead which seems more useful in terms of understanding but also for metrics.

I agree that identifying cluster is important as well, but not sure remoteServiceName alone capture failure/outage neither as value is static. I was looking into how to add host:port to remote endpoint via Kafka AdminClient, and seems possible with AdminClient#describeTopics (https://kafka.apache.org/20/javadoc/org/apache/kafka/clients/admin/AdminClient.html) I will check this further and create another issue if it make sense.

@adriancole I like the pattern idea, but it could be a bit harder to configure/use as you said. I was thinking to include a separator variable between service name and topic name for instance to make it easier to distinguish.

codefromthecrypt commented 5 years ago

jorge, would a finishedspanhandler work? you should have the topic names there right?

On Wed, Apr 24, 2019 at 1:05 AM Jorge Quilcate Otoya < notifications@github.com> wrote:

@shakuzen https://github.com/shakuzen the main difference between messaging to RPC is that we don't have the server (broker) span to name it. To get there, we would need to instrument the broker side.

Dependency graph is the main driver to me: is true that we will get more services (x #topics) but instead of getting all producers/consumers pointing to a cluster, we will get producer/consumer per topic pointing to cluster+topicName instead which seems more useful in terms of understanding but also for metrics.

I agree that identifying cluster is important as well, but not sure remoteServiceName alone capture failure/outage neither as value is static. I was looking into how to add host:port to remote endpoint via Kafka AdminClient, and seems possible with AdminClient#describeTopics ( https://kafka.apache.org/20/javadoc/org/apache/kafka/clients/admin/AdminClient.html) I will check this further and create another issue if it make sense.

@adriancole https://github.com/adriancole I like the pattern idea, but it could be a bit harder to configure/use as you said. I was thinking to include a separator variable between service name and topic name for instance to make it easier to distinguish.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openzipkin/brave/issues/895#issuecomment-485869027, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAPVV2JZE6GYPDLI5BUCDTPR4XV3ANCNFSM4HHKN3EA .

shakuzen commented 5 years ago

Dependency graph is the main driver to me: is true that we will get more services (x #topics) but instead of getting all producers/consumers pointing to a cluster, we will get producer/consumer per topic pointing to cluster+topicName instead which seems more useful in terms of understanding but also for metrics.

I agree with this statement, but I think the same argument applies for HTTP endpoints, which people have asked to see in the dependency graph in the past. So I'm just wondering if we shouldn't think about this and its implications in a more generic way before we put something out in the wild. Maybe not though. I don't have a concrete proposal.

shakuzen commented 5 years ago

Also a note about RabbitMQ: producers publish to an exchange, and then depending on RabbitMQ's configuration, the message ends up in zero, one, or multiple queues. That is to say the producer only knows the exchange it is publishing to, and the consumer only knows the queue it is consuming from. So I'm not sure how this would be reconciled on the dependency graph.

jeqo commented 5 years ago

@adriancole yes, topic names will be available via tag I think. If it's not part of the instrumentation, finishedspanhandler could work, or setting remoteservicename manually as well.

So I'm just wondering if we shouldn't think about this and its implications in a more generic way before we put something out in the wild. Maybe not though. I don't have a concrete proposal.

@shakuzen this might be true, I've been wondering about how to do path analysis by having a service:span call graph instead of only services. Similar to path aggregation described here: https://cwiki.apache.org/confluence/display/ZIPKIN/2018-07-18+Aggregation+and+Analysis+at+Netflix

Even then, the lack of span on the remote service (when messaging) will require to use tags from the producer/consumer span to know which topic was used. Thinking out loud: Should be possible to define remote service span names from prod/consumers when tracing messaging apps?

codefromthecrypt commented 5 years ago

I think somewhere we had an issue to create something for messaging like we have for http (HttpParser etc). There are probably not many low cardinality inputs available for use in naming etc, however, having access to the message object to make a decision should be helpful. I think it is time to implement that abstraction, but different from Http ... add the remoteServiceName aspect. In fact we probably should also add remoteServiceName to the http parsers :P

Are you up to starting this? I think it is long overdue.

On Wed, Apr 24, 2019 at 8:14 PM Jorge Quilcate Otoya < notifications@github.com> wrote:

@adriancole https://github.com/adriancole yes, topic names will be available via tag I think. If it's not part of the instrumentation, finishedspanhandler could work, or setting remoteservicename manually as well.

So I'm just wondering if we shouldn't think about this and its implications in a more generic way before we put something out in the wild. Maybe not though. I don't have a concrete proposal.

@shakuzen https://github.com/shakuzen this might be true, I've been wondering about how to do path analysis by having a service:span call graph instead of only services. Similar to path aggregation described here: https://cwiki.apache.org/confluence/display/ZIPKIN/2018-07-18+Aggregation+and+Analysis+at+Netflix

Even then, the lack of span on the remote service (when messaging) will require to use tags from the producer/consumer span to know which topic was used. Thinking out loud: Should be possible to define remote service span names from prod/consumers when tracing messaging apps?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openzipkin/brave/issues/895#issuecomment-486179878, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAPVV5BVICJRMR72UEAWLTPSA6JDANCNFSM4HHKN3EA .

jeqo commented 5 years ago

@adriancole I remember we discussed something similar around jms instrumentation. Do you mean to have a brave.messaging layer to reuse on jms/kafka/..., right? I like the challenge, I will play around to see how it will look like based on brave.http module, this might make it easier to implement this idea over messaging modules.

About remoteServiceName for http, I will take a look as well and create an issue to track it.

codefromthecrypt commented 5 years ago

yes exactly this

On Fri, Apr 26, 2019, 12:02 AM Jorge Quilcate Otoya < notifications@github.com> wrote:

@adriancole https://github.com/adriancole I remember we discussed something similar around jms instrumentation. Do you mean to have a brave.messaging layer to reuse on jms/kafka/..., right? I like the challenge, I will play around to see how it will look like based on brave.http module, this might make it easier to implement this idea over messaging modules.

About remoteServiceName for http, I will take a look as well and create an issue to track it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openzipkin/brave/issues/895#issuecomment-486735101, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAPVVYWZTR57F7S3B2KXSLPSHIZ3ANCNFSM4HHKN3EA .