openzipkin / zipkin

Zipkin is a distributed tracing system
https://zipkin.io/
Apache License 2.0
16.95k stars 3.09k forks source link

Add the ability to differentiate between multiple nodes of a single service in the UI #2512

Open codefromthecrypt opened 5 years ago

codefromthecrypt commented 5 years ago

Notably data service tracing can result in many spans all classified by the same service. For example, cockroachdb or cassandra. The trace detail screen should allow users to annotate rows describing a span with the ip, port.. possibly the same way HTrace did back in the day (delimited additions to the service name)

While it might be useful to heuristically decide if ip/port should be added by default into the trace screen, doing so can be tricky and possibly confuse users. We decided to start with just making it possible.

https://cwiki.apache.org/confluence/display/ZIPKIN/2019-04-17+UX+workshop+and+Lens+GA+planning+at+LINE+Fukuoka

codefromthecrypt commented 5 years ago

this was originally requested by @kellabyte on gitter about cockroachdb

Logic-32 commented 5 years ago

Frankly, I liked how Zipkin 1.x did this with the ca/sa constants. The UI had some logic for determining which tag/annotation was the most authoritative and would show that. I honestly miss that as we used the SA annotation to differentiate our DB calls from our HTTP calls. Now everything has our web service as the name.

So your solution could be as simple as this: add a "magic tag" which acts as an override for the service name. Whatever you put in it is what you put in it. This may be only as far as the UI is concerned or it can make it all the way into storage as the serviceName.

codefromthecrypt commented 5 years ago

Thanks for the notes.. I will help classify this as the issue is a little different I think

Frankly, I liked how Zipkin 1.x did this with the ca/sa constants

https://github.com/apache/incubator-zipkin/blob/2.10.0/zipkin/src/main/java/zipkin/Constants.java#L199. The UI had some logic https://github.com/apache/incubator-zipkin/blob/1.31.3/zipkin-ui/js/component_ui/traceSummary.js#L63 for determining which tag/annotation was the most authoritative and would show that. I honestly miss that as we used the SA annotation to differentiate our DB calls from our HTTP calls. Now everything has our web service as the name.

This issue is about when the service name is not enough (there's only one service name). The authoritative service name thing matches a couple issues I think this one possibly works https://github.com/apache/incubator-zipkin/issues/1202

So your solution could be as simple as this: add a "magic tag" which acts

as an override for the service name. Whatever you put in it is what you put in it. This may be only as far as the UI is concerned or it can make it all the way into storage as the serviceName.

Do you think specifying and populating a new tag would be easier than showing the IP/port that's already in span.localEndpoint

codefromthecrypt commented 5 years ago

for those who haven't seen the wiki, one most relevant part is how this relates to prior art, notably htrace who ran into the same problem and worked a model around it.

HTrace was a successor to zipkin v1 and had a "magic tag" called tracerID which included some hierarchically assembled data, a lot, but not all of it in span.localEndpoint. The neat thing here is that it was ordered which meant you could reason with scopes.. For example if something is at a scope smaller than IP, there would be data beyond the IP field https://github.com/apache/incubator-retired-htrace/blob/master/htrace-core4/src/main/java/org/apache/htrace/core/TracerId.java#L36

This however implied a partial string query to implement. Also, different instrumentation may define this tag differently, which could complicate the ability to reason beyond the IP address.

Since the original request here was to make IP and port visible, and we already have that data in span.localEndpoint, we settled on a design that is easy to implement and doesn't require changing instrumentation or existing data to achieve (unless for some reason instrumentation didn't add IP in their data which could happen also)

There will be different usability stories, relating to site-specifics, where other mechanisms could make more sense. For example, a site could define something not in our model, like a special tag. I would think of this as a separate activity from how to present IP/port information.

codefromthecrypt commented 5 years ago
Screen Shot 2019-04-19 at 1 41 54 PM

Here's the screen shot showing what we discussed at the workshop. Lacking another special grouping tag to differentiate nodes in the same service, we could default to subgroup and color by IP/port.

This doesn't mean we can't also do a special tag, just that using data we define in the model is simpler to implement and explain why doesn't work without relying on site specifics.

codefromthecrypt commented 5 years ago

It would be nice to have an example "monoservice" trace that is in zipkin format as we could probably do a mock-up that could make more concrete the difference in display. For example, this one from cockroach has no service or IP information, and it also isn't in zipkin format either.

https://gist.github.com/kellabyte/c07bc34b4155231743c61edf5b977f42

Probably any data service trace or something from a large service graph where many nodes are in the same service would help elaborate. If someone does, we could add it to the zipkin-ui/testdata folder

Logic-32 commented 5 years ago

Your clarification makes sense. #1202 is definitely more related to what I was talking about.