opentracing / specification

A place to document (and discuss) the OpenTracing specification. 🛑 This project is DEPRECATED! https://github.com/opentracing/specification/issues/163
http://opentracing.io/spec
Apache License 2.0
1.17k stars 182 forks source link

New standard tag for protocol or operation type #72

Open felixbarny opened 7 years ago

felixbarny commented 7 years ago

I suggest a new tag which can be used to differ the operation type or protocol of a span.

Use Cases

These are the use cases which are relevant for stagemonitor which is an agent built on top of OT-API. It automatically instruments servlets for example and also extracts metrics from spans.

Filter tags by type

For stagemonitor, I've implemented a generic Kibana dashboard which can be used to analyze "all the spans". While systems like zipkin focus on the path of individual traces, this dashboard gives you a bird view on all collected spans. For example, you see the most frequently executed spans grouped by operation name and duration percentiles. In this dashboard, it makes sense to have a filter to only see the metrics for spans of a particular type.

The easiest and most clean way to implement such a filter in Kibana would be to have a single type tag which can have multiple values. That way it's easy to implement a pie chart for example which shows you the distribution for http vs sql spans. By clicking on it the dashboard will automatically be filtered to show only SQL metrics for example. Another useful visualization would be a table which lists the average execution time and count of spans per type.

This type is somewhat similar to span.kind. But the possible values client|server are not fine grained enough. One possible use case for that dashboard is to find out which frequently SQLs are particularly slow and then see the individual queries of those queries in order to be able to optimize the sql.

Extracting metrics from spans

Some of you might be familiar with @bhs's talk about tracing and metrics. Stagemonitor does something very similar. It wraps the Span and SpanBuilder and tracks relevant metrics like the duration of a span with dropwizard metrics. When analyzing the response time in a dashboard, it is quite important to distinguish the type of the span. For example as a user, I'd want a filter to see the timings of incoming http requests.

The timer created by stagemonitor looks more or less like this: name: response_time, tags: {operation_name="span.operationName", type="span.tags.type"}

Sampling per operation type

This tag can also be useful to perform sampling per operation type. For example you could say I want to sample 100% of incoming and outgoing http requests but only 10% of jdbc queries (there are many applications which issue hundreds of SQLs per request).

See impl here: https://github.com/stagemonitor/stagemonitor/blob/f37386bfaf286038ad682ccb59a9430402a40512/stagemonitor-tracing/src/main/java/org/stagemonitor/tracing/sampling/ProbabilisticSamplingPreExecutionInterceptor.java

Alternatives/Work arounds

@ivantopo and @pavolloffay suggested that SQL spans or HTTP spans can be distinguished by looking for the presence of http.* or db.* tags.

This seems to be a quite vague and can't be applied to all operation types. Besides, I think there is value in explicitly specifying how certain operation types should be named. For example http vs incoming-http vs outgoing-http. It would make interoperability between different OT related projects easier: There is contrib repo for OkHttp but it does not specify the type of the spans it creates. That makes it hard for a tool like stagemonitor to reuse it and to track the durations for OkHttp spans as it expects a type tag to be present.

What's next

If we decide that such a tag should be standardized, we should agree on a good name, as I'm not sure how to name it properly. Some suggestions: type, span.type, operationType, protocol.

Also, possible values of the type should be defined.

yurishkuro commented 7 years ago

We also ran into a need for something like the ipc.protocol tag when emitting metrics for RPC spans. I think type, span.type, and operationType are too general to be useful. Protocol is a bit more concrete, but also has its problems. Would http be a valid value? What about grpc? Doesn't grpc use http under the hood?

What kind of values did you have in mind?

felixbarny commented 7 years ago

I think type, span.type, and operationType are too general to be useful.

I also think that. Unfortunately, I can't come up with a name I'm happy with.

Would http be a valid value?

Ideally, I'd like to be able to differ between incoming and outgoing http calls.

What kind of values did you have in mind?

I currently use incoming-http, outgoing-http (though I don't quite like that naming), jdbc and method_invocation.

yurishkuro commented 7 years ago

well, I would be against "incoming-" prefix - that's now mixing orthogonal concerns, as the direction of the RPC is expressed via tags. If you backend needs a single tag for indexing purposes, you can easily pre-process the spans before saving / indexing them.

My use case for the protocol tag is around YARPC that abstracts away the underlying transport from the user code, so that the same logical endpoint name ServiceX::OperationY is called via YARPC, but the protocol tag would distinguish if the communication is done via HTTP, gRPC, or TChannel. Without the protocol tags the metrics for different transports become jumbled together.

clutchski commented 7 years ago

tl/dr: My vote is for span.type, operation.type (or something like it).

To recap, the core use case of this proposal is to enable OT stats backends to sanely group similar operations - sql queries, requests to web servers, cpu computations - so they can be counted, ranked. It will easily enable questions like:

I thinkoperation.type is the only clear signal that you are linking the operation you're tracing when creating the span. Using protocol eliminates the ability to use it for CPU bound spans that aren't network boundaries, like templates or any expensive operation like image rendering.

Also note, I think this concept is badly needed for opentracing contrib modules to be portable across backends, so in general, thanks for proposing this and I'd like to see something land.

wu-sheng commented 7 years ago

In sky-walking, I think we have a similar tag like span.type, but we called it layer. I am not sure this is the good name, but I think this concept is useful for backend.

e.g. If layer=jdbc, sky-walking analysis backend knows to find jdbc.statement, jdbc.type etc.

clutchski commented 7 years ago

e.g. If layer=jdbc, sky-walking analysis backend knows to find jdbc.statement, jdbc.type etc.

Note: I'm not sure that this should be first class goal of this proposal, since most of this is covered by the db tags.

wu-sheng commented 7 years ago

@clutchski I want to demostrate a type tag for backend. Even we have a lot of db. tags, but as a backend implementation, it can't tell whether find the db. tags or others. Maybe we can tell from the component somehow, but it doesn't work everytime. When I trace MyBatis, the component is mybatis, not JDBI.

Maybe I should demostrate on a rpc framework. If layer=rpc,(not a http) , the backend looks for peer.service, peer.ipv4/v6/hostname.

Hope this matches @felixbarny 's proposal.

have a single type tag which can have multiple values

felixbarny commented 7 years ago

I almost forgot two very important possible values:

well, I would be against "incoming-" prefix - that's now mixing orthogonal concerns, as the direction of the RPC is expressed via tags. If you backend needs a single tag for indexing purposes, you can easily pre-process the spans before saving / indexing them.

I like that idea. Also, I don't necessarily need a single tag to filter on. It would be possible that the user first has to select the span.kind tag and then the "operation type".

But that implies that there are both symmetric and asymmetric pairs of spans in regards to the operation type which is something we should be aware of.

Examples of asymmetric span pairs: full-page-load|ajax (span.kind=client) -> http (span.kind=client) Examples of symmetric span pairs: http (span.kind=client) -> http (span.kind=client)

Do we really want that or wouldn't it make sense to say that there should only exist asymmetric span pairs so that it is always implicitly clear who the sender and who the receiver is?

If you backend needs a single tag for indexing purposes, you can easily pre-process the spans before saving / indexing them.

Prefixing asymmetrical operation types with the "direction" or the span.kind does not really make sense, because they are not ambiguous anyway. So if we decide that symmetric span pairs are possible I'd rather implement another filter which filters by the span.kind.

Using protocol eliminates the ability to use it for CPU bound spans that aren't network boundaries

The values full-page-load and ajax also are not a protocol. The protocol actually is HTTP in this case but I think it's important to distinguish between the types of http calls.

Hope this matches @felixbarny 's proposal.

This definitely goes in a similar direction, although this proposal is not about which tags should be present for which operation type, which could be another proposal once we agree on the operation type thing.

felixbarny commented 7 years ago

Any more thoughts or votes?

Recap on things we should decide on:

yurishkuro commented 7 years ago

I find it difficult to come up with a sensible taxonomy for "operation type". We can potentially define this tag with some examples, but leave the actual values up to instrumentation code.

felixbarny commented 7 years ago

I also think that the possible values should only be examples. But we definitely should define if symmetrical operation types are valid.

Yuri Shkuro notifications@github.com schrieb am Mi., 31. Mai 2017 19:09:

I find it difficult to come up with a sensible taxonomy for "operation type". We can potentially define this tag with some examples, but leave the actual values up to instrumentation code.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/opentracing/specification/issues/72#issuecomment-305253966, or mute the thread https://github.com/notifications/unsubscribe-auth/ACEDCGmJoDxejqIx651eT8pVpPuTqNCYks5r_Z7PgaJpZM4Nnb2I .

bhs commented 7 years ago

@felixbarny sorry for the insane delay in responding to this. I spent some time on my email filters over the weekend so that I'm not missing so much stuff I actually care about!

Re the proposal, IMO a type tag is too general. In many cases, what you're describing sounds similar to the existing component tag:

image

(https://github.com/opentracing/specification/blob/master/semantic_conventions.md)

Something like full-page-load is admittedly different than a component, though. We could try to introduce something like a semantics tag (a little more specific than type). Before doing so, I do want to make sure I understand the actual end-to-end use case: it's cool to think about measuring performance for every full-page-load, but aren't they ultimately going to be apples and oranges? I.e., different pages have wildly different performance characteristics, and that's both normal and expected. I just want to understand the full use case a bit better before making additions to the core spec (though I'm not opposed once we document those use cases).

Also, I'm sorry if I misread (or downright missed) something above, I'm catching up quickly on a lot of PRs/issues at once here.

Thanks again for making the proposal.

cwe1ss commented 7 years ago

The .NET framework will get built-in support for distributed tracing with an OpenTracing-like API. They have some important differences though. One of them is that they changed the meaning of "operation name": When you create a span (they call it "activity") you have to specify a unique compile-time constant as the name instead of a "coarse-grained instance identifier".

As an example, an outgoing HTTP request will have the name "System.Net.Http.HttpRequestOut" and an incoming request in ASP.NET Core will have the name "Microsoft.AspNetCore.Hosting.HttpRequestIn". So the names are very specific and based on the module they are used in.

IIUC they've done this for the following reasons:

I don't like how they've re-used the term "operation name" for it (there's an open discussion for this here: https://github.com/dotnet/corefx/issues/19071 ) and their approach also has the obvious drawback that they no longer have the "operation name" feature from OpenTracing but all of this is a different story.

Anyway, I do like the fact that this unique identifier exists since it provides some useful features - like those already mentioned in this issue! Also, it would make writing a .NET -> OpenTracing bridge easier.

I also think that this feature is related/similar to OpenTracing's component tag. However, the way the .NET framework uses it is even more detailed. E.g. an "MVC" component could have multiple spans with different types (e.g. "MVC.ExecuteAction", "MVC.RenderView", ...).

One solution would be to "extend" the meaning of the component tag and formally allow hierarchy within it. Instrumentation-code could then use e.g. the "."-separator like in .NET and tracing systems could allow for grouping/filtering either based on full values or parts of it.

I also think that it's not required to formalize values. It's up to the instrumentation to come up with a unique name.

clutchski commented 7 years ago

component would work for me if:

I agree no definition needs to be made but I think it really affects the potentially re-usability of contrib modules if we aren't explicitly making it clear to backends which spans are tracing the same kind of operation.

wu-sheng commented 7 years ago

One solution would be to "extend" the meaning of the component tag and formally allow hierarchy within it. Instrumentation-code could then use e.g. the "."-separator like in .NET and tracing systems could allow for grouping/filtering either based on full values or parts of it.

+1 on this. @cwe1ss And more, we need explicit values for the component for better understanding.

e.g. we use the "."-separator in `component, then we should give a more explicit table about the common values, otherwise, OT-users didn't know how to set the value, only we understand.

felixbarny commented 7 years ago

it's cool to think about measuring performance for every full-page-load, but aren't they ultimately going to be apples and oranges? I.e., different pages have wildly different performance characteristics, and that's both normal and expected.

That's right, and that's why an operation_name filter is important as well. Although it can also be useful to see the page load performance of the site overall. This is the Grafana dashboard I created for stagemonitor:

stagemonitor-grafana-ot

Wanted to participate in the OTSC meeting yesterday but couldn't make it :(

bhs commented 7 years ago

@felixbarny I continue to think about this off and on.

The more I think about "component", the less I like it... the idea there was definitely to identify the package / module / layer (e.g., "grpc"), which is not the same thing as making sure that all RPC frameworks identify RPC spans with a common tag.

And the more I think about it, the more I realize that you are basically asking for a generalization / population of the existing "kind" tag. It was originally just client/server, though now it also can be producer/consumer for message buses; would "kind"="pageload" be reasonable? IMO yes.

So that's another concrete proposal.

felixbarny commented 7 years ago

span.kind=pageload would make it very hard for libs like brave-opentracing to map pageload to the cs (client send) annotation. I don't think we can change the semantics of span.kind at this point.

It definitely goes into the same direction as kind but is more specific. My best suggestion would still be the span.type thing. I know that "type" sounds quite vague but doesn't "kind" sound quite generic as well? In the end it comes down to the documentation how well the standard tags are described and what instrumentations are expected to set.

bhs commented 7 years ago

@felixbarny I'm not proposing that kind=cs would be interpreted any differently... there would simply be more Spans with kind annotations. I would expect the brave-opentracing bridge to simply ignore kind values it did not expect or understand.

beberlei commented 7 years ago

This sounds a lot like an annotation component, which gives the span a layer name of sorts, which then automatically has a protocol attached.