opentracing / specification

A place to document (and discuss) the OpenTracing specification. 🛑 This project is DEPRECATED! https://github.com/opentracing/specification/issues/163
http://opentracing.io/spec
Apache License 2.0
1.17k stars 182 forks source link

Add a log level [and namespace] in the API #57

Open tahini opened 7 years ago

tahini commented 7 years ago

Hi,

I'm a developer of the Trace Compass trace analyzer/viewer and I usually use LTTng for Tracing. I've been looking at the Open Tracing API to see how we can use that in the Trace Compass code.

The main interest I see in this API is that it gives structure and semantics to the various points of interest in the code and it can greatly simplify the analysis of the traces afterwards.

But what I feel is missing in this API to be useable for an application like Trace Compass is a log level, and optionally a namespace. From what i understand, now, you either trace the application, or you don't and if you do, all tracepoints will be logged. With a log level, you can fine-tune the amount of data that will be logged, depending on what you want to achieve with tracing. And a namespace allows to trace only the components of interest.

Trace Compass is a java application and we are using JUL for tracing. We use a wrapper that does something similar to the Open Tracing API, but takes the JUL logger (provides a namespace) and log level, so we obtain semantically equivalent traces. We would like to use a known API, but without level and namespace, the interest is limited.

Any opinion on that? Would it be possible to add optional log levels and namespace to the API?

Cheers, Geneviève

wu-sheng commented 7 years ago

First, what do LTTng and JUL stand for?

As my understanding, the log level is giving tracer the opportunity to uplink(or log) some part of the trace? What is namespace for, in a tracing system?

tahini commented 7 years ago

oh right sorry:

The log level is specified per tracepoint, so that depending on your configuration, if the log level is lesser than the one configured, it is ignored and returns fast from the call. In logging/tracing frameworks like JUL, we can even have multiple handlers that will handle different log levels.

The namespace would be to identify the sub-system/module/component of a trace point. In a large application that has only one launcher (so one way to specify the tracing libraries to use), it might be useful to add that so that we can specify that we want to trace componentX at level FINE and all other components at level WARNING for instance. Or even for a global tracer like LTTng, if many instrumented applications are executing, we can specify which ones to trace by using their namespaces.

wu-sheng commented 7 years ago

Thank for your explanation. I think the namespace and log level, like some part of sampling mechanism.

  1. If Component-A is not configured, the trace info of Component-A will be ignored.
  2. Log level do the similar thing as any Java Logging Component, ignore things when the setting level is higher.

Am I right?

For both configs, let us think in a distributed trace scenario,

             JVM  A ----------------------------> JVM B
      Apache HttpComponent                        Tomcat

If the Apache HTTPComponent is not configured, the context should be propagated across-process, or not?

And this scenario exists when we set the Higher log level to ignore some things, specially in propagation-point(cross-thread, cross-process).

If we face a specific tracer implementation, may be easier to explain. But OpenTracing as a spec, I am concerned about these two concepts.

And, these two configs may work fine, if we do not need context cross-process propagation. But sadly, this is the very, maybe most, important thing for us when we face a distributed tracing.

Please correct me, if I misunderstand your proposal.

tahini commented 7 years ago

You are right in your understanding of the concepts.

For your question, yes, the context should be propagated, as it will exist no matter the log level. The event will happen. The log level only determines if the event will be recorded in the trace or ignored.

Of course, for inter-process/distributed tracing, levels and namespace will have to be configured in both systems accordingly, otherwise it will appear like there are lost events. But with default values, it should be fine, so applications who don't need those concepts can just ignore them and tracing will work as it does now.

The problem with tracing is that when we start using it, it's so great that we want more more! And the amount of data collected soon gets out of hand, tracing starts having too much overhead for some use cases like race conditions and performance monitoring. Then, having concepts like log levels and namespaces become the way to solve this.

imho, it should be part of the API, otherwise, applications/tracers will end up wrapping the Open Tracing API in another one (less official) which will contain those concepts (like we do).

If you're interested, I could work on a proposal for the specification with log levels and namespaces.

wu-sheng commented 7 years ago

If you want to work on a proposal with more details, that is great, and welcome.

Here is my thought:

  1. Namespace and component in OT, are similar concepts. It targets to particular component(s), a group of components. You should define the diff between these two concepts. Maybe give us examples if we use this concept in modeling special circumstances. And please consider this in your proposal: Whether sampling based on this or not, I think it depends on the specific tracer implementation, this should not stay in specification. Every tracer implementation has their own sampling mechanism.
  2. Log level(consider a more reasonable name) likes in sampling.priority scope, which we already have, but has more details control. Both are in sampling story. Can you give us, how many levels should sampling have? And why? Common Log levels , likes DEBUG, INFO, etc, seem not right in sampling scenario for me.

If you are not ready, feel free to continue the discussion at here.

yurishkuro commented 7 years ago

If instrumentation always sets a component tag on the span, then tracer implementation can have a mechanism for controlling how much logs should be accepted from a given component. If in addition you want to accept or reject logs from the same component, then log methods can be passed a custom "level" field that can also be understood by the tracer implementation.

So I think the API currently is flexible enough to support your use case, @tahini. If your intention is to make "component" tag and "level" field required in the spec, then it's a much harder sell, given that the spec doesn't even require that all tracers support the logging API (as in, they have to support it at the API level, but they are not obligated to store the logs).

It would be interesting to see if you can make this work with just the existing API and these two tags.

bhs commented 7 years ago

@tahini thank you for opening this issue.

Regarding component, there are two concepts already, one of which is somewhat hidden.

  1. As mentioned already, there is the component tag... I think this is probably better than adding component to every log for a Span, as long as we can all agree that the component for all logs in a Span should be the same. (I think that's a fair assumption)
  2. Also, many Tracer implementations accept some sort of component-like string at construction time which scopes all data generated by that Tracer. This is certainly true for Zipkin, Jaeger, LightStep, etc, though I haven't checked all OT implementations for this. That string usually represents the service / process name.

Anyway, I would argue that we leave component as is. At least until it's shown to be insufficient.

Regarding log levels: this one is trickier. I will cc @jmacd who always has thoughts on it. There is no question about the utility of having relative priorities for log entries... I get that. The hard thing is coming up with a system that is treated consistently from codebase to codebase. FATAL/ERROR/WARNING/INFO/etc isn't terribly hard to imagine, but once we get into various levels of FINE/FINER/FINEST, I am always a little aghast at the imprecision.

(I see that @yurishkuro has just chimed in... I will submit what I have rather than drone on, but happy to discuss further.)

wu-sheng commented 7 years ago

@tahini , The log level, in your you proposal, is it only for ot span.log?

And @yurishkuro @bhs , in prev comment, mentioned

The log level is specified per tracepoint, so that depending on your configuration, if the log level is lesser than the one configured, it is ignored and returns fast from the call

It is like targeting the whole span, not just out log for my understanding.

tahini commented 7 years ago

Thanks all for the feedback. I see that namespace / component is covered, so I'll leave it out.

now remains the question of the log level.

  1. It wouldn't be mandatory to have it, as there could be a default level for events that are logged without one and since most use cases till now seem fine without it, it should not be imposed.

  2. There should be enough levels to allow flexibility. As @bhs mentioned, FATAL/ERROR/WARNING/INFO are known and are part of the habitual logging scheme where INFO is the average high level operation. Here we're talking about tracing, that means we usually want more details than that. So there needs to be at least 3 levels after INFO imho. In Java JUL, there is FINE/FINER/FINEST. There is no clear answer as to which to use. It's usually up to the developer's good judgement, but guidelines could be the impact on performance of tracing at a certain level. FINE should not have "too much" overhead (#define "too much" ~5%?), so could still be used in performance monitoring. FINER and FINEST, well, it's for the rest...

  3. As for the scope of the log levels. The span object would have a default log level (set in the constructor) that will be used by the log methods. But again, one could optionnally want more or less detailed events inside this span. Depending on the use case, it might be preferable to have an extra span if the more detailed part actually represents a span, or it may be individual events (which implies adding a LogLevel parameter to the log(...) methods, though for that case, it could be part of the log fields, only the constructor would need the span-wide log level).

In answer to @yurishkuro's proposition to use a tag for the LogLevel, that is certainly possible. Is it advisable though? The role of this tag is to provide fast path when something is not logged, so it is more than just a tag to log, it may not even make it to the trace. While other tags/fields's role are more at analysis time (after tracing), this one's role is during tracing. It has a different semantic. So mixing them may make the API simpler, but it will make the code more complex. It's the trade-off.

jmacd commented 7 years ago

Since my opinions were mentioned above, I'll gladly list them.

  1. I wouldn't assume that a log statement's namespace equals the span or tracer's component. Imagine trying to single out a specific log statement, it could have its own label ("log.statement=very_special_statement_we_sometimes_enable_but_usually_disable"). It's easy to imagine selecting log statements based on the tracer component, the span operation name, or the log statement label (among other criteria).

  2. The use-case given above is JUL logging. To me this says that you will continue to use j.u.l.Logger objects as the primary Logging API, so the question about the "fast path" is not an OT question, the filtering by labels should be done in a j.u.l.Handler. Placing filters in the call to OT Log() as labels makes perfect sense, but by then you will have made a decision to log or not to log. (Presumably, your j.u.l.Handler also has access to your debugging context, so that you could potentially access the SpanContext and its baggage or its Tracer if dynamic tracing state will be used for logs filtering.

  3. The question about log level NAMES seems overblown to me. The value of a log level need not be standardized. j.u.l. logs can use FINE/FINER/FINEST according to whether the call site used FINE/FINER/FINEST. If I decide to invent a log level named SUPER_IMPORTANT that's fine too. If the only trouble you run into is that Java apps write log.level=FINE and C++ apps write log.level=V(1), we're doing fine. I don't see any reason to require a level, either.

wu-sheng commented 7 years ago

The question about log level NAMES seems overblown to me. The value of a log level need not be standardized. j.u.l. logs can use FINE/FINER/FINEST according to whether the call site used FINE/FINER/FINEST. If I decide to invent a log level named SUPER_IMPORTANT that's fine too. If the only trouble you run into is that Java apps write log.level=FINE and C++ apps write log.level=V(1), we're doing fine. I don't see any reason to require a level, either.

Agree @jmacd . Personally, I do NOT like FINE/FINER/FINEST, even it is a part of j.u.l . They are out clear about anything.

@tahini As the level influences the span.log, I prefer the log.priority as a number(maybe byte is enough). That gives chances to the tracer implementations, which logs should be collected with higher priority, means more important.

jmacd commented 7 years ago

At my former company, we had a mapping from j.u.l.Level values onto the integers [0, 1200] and then mapped from other languages/logging-library levels onto the roughly-corresponding values. I'm not sure it was worthwhile.

tahini commented 7 years ago

Thanks all. I think there's enough material here to figure if there's anything new to add to the spec, or if it is just a matter of documentation. How about the sampling.priority span tag. Is it the log.priority you are talking about or does it have some other special purpose?

wu-sheng commented 7 years ago

@tahini The existed tag sampling.priority is about capturing the trace. I can find the semantic convention

If greater than 0, a hint to the Tracer to do its best to capture the trace. If 0, a hint to the trace to not-capture the trace. If absent, the Tracer should use its default sampling mechanism.

But I assume the log.priority, not existed right now, is about capturing the log in a single span.

wu-sheng commented 7 years ago

Did #62. Wait other's opinions.

tedsuo commented 7 years ago

I would +1 to having log.priority as a semantic convention. But due to opentracing's decoupled nature, may I suggest it work like the following:

wu-sheng commented 7 years ago

@tedsuo . See #62. Happy you are +1 on this. Although I understood the reasons, why @bhs and @yurishkuro -1 on log.priority, still I think if we have log operation, then, log.priority is necessary. The priority value stuff can be settled based on vote (not important right now).

jmacd commented 7 years ago

Is there some reason to call this "priority" instead of "severity"? I feel that severity is more appropriate. ᐧ

On Thu, Apr 6, 2017 at 6:17 PM, 吴晟 Wu Sheng notifications@github.com wrote:

@tedsuo https://github.com/tedsuo . See #62 https://github.com/opentracing/specification/pull/62. Happy you are +1 on this. Although I understood the reasons, why @bhs https://github.com/bhs and @yurishkuro https://github.com/yurishkuro -1 on log.priority, still I think if we have log operation, then, log.priority is necessary. The priority value stuff can be settled based on vote (not important right now).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/opentracing/specification/issues/57#issuecomment-292387909, or mute the thread https://github.com/notifications/unsubscribe-auth/ADdiicvNM4ebW0p6MxyoUecxa1S-Ewapks5rtY69gaJpZM4MoWBR .

tahini commented 7 years ago

Different languages have different names for the same concept I guess. "Priority", "log level", "severity". Whatever you're more used to will feel more appropriate.

But "severity", somehow feels wrong to me. If it's severe, it should be handled by something else than tracing. So severity here would typically start pretty low (or high, depending on how you look at it), from "not that severe" to "only worth mentioning if you're interested".

jmacd commented 7 years ago

Yes, but the two words have different meanings and if these are the same concept, one of the words should be preferred. Moreover, there are conventions around what these words mean in bug-tracking systems, and I believe the concepts are the same: https://wiki.eclipse.org/WTP/Conventions_of_bug_priority_and_severity By those conventions, e.g., severity would be set by the "originator" i.e., the log statement / developer, and priority would be set on behalf of the affected parties i.e., the recipient / operator. ᐧ

On Fri, Apr 7, 2017 at 12:04 PM, Geneviève Bastien <notifications@github.com

wrote:

Different languages have different names for the same concept I guess. "Priority", "log level", "severity". Whatever you're more used to will feel more appropriate.

But "severity", somehow feels wrong to me. If it's severe, it should be handled by something else than tracing. So severity here would typically start pretty low (or high, depending on how you look at it), from "not that severe" to "only worth mentioning if you're interested".

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/opentracing/specification/issues/57#issuecomment-292624858, or mute the thread https://github.com/notifications/unsubscribe-auth/ADdiiWpv4bJXI12e_DpyHb-vBgtAsqKxks5rtoiggaJpZM4MoWBR .

yurishkuro commented 7 years ago

NOTE: please see additional discussion in #62

wu-sheng commented 7 years ago

@yurishkuro I have closed #62, if we need further discussion, we stay in here. : )